Guglielmo Cassini – Online Resume

Guglielmo CassiniData Scientist and Machine learning engineer
Email
[email protected] Phone
3453240798 Location
Milano (MI) 
26th August 1998 
Italiana 
B LinkedIn
https://www.linkedin.com/in/guglielmo-cassini-05592a189/ GitHub
https://github.com/gCass 

Professional Experience

Trustfull, ML and AI Engineer

As a Machine Learning and Artificial Intelligence Engineer, my responsibilities include the management and maintenance of the codebase and the company's core models. Furthermore, I am responsible for the integration of new functionalities through the implementation of cutting-edge AI models.

May 2025Milan, Italy

Lutech, Data scientist

Consultant working with different clients in many data science and artifical intelligence projects.

May 2024 – presentCinisello Balsamo (MI), Italy

Target Reply, Consultant

Junior consultant working as Data Scientist and Data Engineer in different projects and with different clients.

October 2022 – May 2024Milan, Italy

Ticinum Aerospace, Data Scientist - Part time

Development of web crawler using python and frameworks.
Development and analisys of models in wineinformatics and geo spatial data fields.

July 2021 – July 2022Pavia, Italy

Sata Consulting, IT technician stage

Learning and develop in powerbuilder. Learning and usage of SAP HANA. Alpha test execution for internal company software. 
Monitoring of company network through the usage of Spiceworks, online network monitoring tool.

June 2017 – July 2017Pavia, Italy

Education

AWS Machine Learning Specialist Certification

November 2023

Master of science in Computer engineering, specialization Data Science, Università degli studi di Pavia

Graduated with 110/110 with Laude

September 2020 – September 2022Pavia, Italia

Bachelor Degree in Electronic and computer engineer with curriculum computer engineering, Università degli studi di Pavia

Graduated with 104/110

September 2017 – November 2020Pavia, Italia

Computer science diploma, ITIS G. Cardano

With grade 100/100

2012 – 2017Pavia, Italy

Languages

Italian
Native speaker
Spanish
Half native speaker
English
B2

Skills

Object oriented programming and design patterns
Statistical and machine learning
Data analysis
Data Mining
Clustering algorithms
Predictive models
Linear Regression
Logistic Regression
Ridge Regression
Classification algorithms
Decision trees
Ensemble methods
Natural Language
Processing & Text Mining
Reinforcement learning and Deep reinforcement learning
Hadoop and Apache Spark
Apache Hive
Apache Spark
Hadoop MapReduce
PySpark
Soft skills
Comunication
Active listening
Negotiation
Infrastructure as Code
Terraform
CDK
Cloud Computing
ML Specialist
Development tools known
Apache Hive; Apache Spark; GIT - GitHub; Hadoop MapReduce;
Jenkins; Jetty Web Server; Jupyter Notebook, Git, Pycharm, Eclipse IDE, Android Studio;
MATLAB & Simulink
Amazon Web Services
Sagemaker, SageMaker Studio, SageMakerEndpoints, ECS, ECR, Lambda, Step functions, Data Pipeline, EMR, AWS Batch, DMS, RDS, Redshift, DynamoDB, S3, EC2, Kinesys (Strem, Analytics, Firehose), Glue (Data Catalog, ETL, Athena), QuickSight, Bedrock
Google Cloud Platform
Bigquery
Google cloud storage 
Vertex AI
SQL and NoSQL
MySQL, PrestoSQL, SparkSQL, MongoDB
Deep learning
Keras
Tensorflow
CNN, RNNs and LSTMs networks
Generative Artificial Intelligence
•Langchain framework
•Few shot learning
•Chain of thoughts
•Agentic AI
Programming languages and Frameworks
Android
Bash scritping
C
Dart, Flutter
HTML, CSS, Bootstrap
Java EE 7
Javascript, Javascript/Ajax
Labview
Matlab
OpenMP
PHP
PySpark
Python (scikit-learn, pandas, numpy, maptplotlib)
SQL
R
Software Engineering and Object oriented programming
Design patterns
Clean code principles follower
CICD
•GitHub Actions mid level; 
•Jenkins basic level
•AWS Code deploy & Code pipelines
Containerization
•Docker and docker compose
Git and Github
Dataiku

Projects

Trustfull Main Product

As a backend, Machine Learning, and Artificial Intelligence Engineer, I'm working on the company main product: a web platform for enriching personal data (such as emails, phone numbers, first names, and last names).
My goal is to process and interpret this data to create digital scoring systems, which help client detect fraudolent users and not fraudolent users.

May 2025

Target Stock, Amplifon

To enhance the service level of products offered by shops for their customer and to provide an automatic solution, the aim of the project is to build a model able to estimate the target stock for each product for each shop of the client.
To do so, the project has been divided in two steps:

September 2024 – present

•a first classification task in which a model intifies if a product will sell in a shop in the next week

•a deterministic algorithm to estimate the quatity of so called product in that shop

The same project has been applied for dirrent countries in which the client is available (IT, DE, ES, BE) and is in continous expansion being developed for new countries .

As a Data scientist my principal tasks were:

•to build a data preparation flow to transform raw data into features

•to indetify useful features and to train different models and evaluate their performances 

•to evaluate different deterministic algorithms performance through ad hoc analysis

•to enhance current power bi dashboards to make the model results avaiable to final stakeholders

•to present results and advancements to stakeholders.

The main technologies used where:

•Dataiku

•Python (with datascience libraries: pandas, numpy,...)

•SQL

Member get member, Amplifon

In the context of a promotional campaign, the client has the need to develop an automated flow detecting if a client benefits of campaign and detect the number of discounts the customer is a beneficiary.

July 2024 – present

As a Data Scientist, my principal tasks where:

•to develop a data pipeline to obtain the principal KPIs of the customers related to the process

•to integrate the new flow with external software developed by a third company

•to connect the final output to a Power BI dashboard, developed ad hoc to present resul

The main technologies used were:

•Dataiku

•SQL

Demand Supply Forecasting, Amplifon

To reduce costs associated to buy industrial components to produce its products, the client required a demand supply forecasting system.
As a Data scientist my principal tasks were:

May 2024 – present

•to build a data preparation flow to transform raw data into features

•to indetify useful features and to train different models to evaluate

•to evaluate the models performance through ad hoc analysis

•to build automated flows to automatize the training of the model each month

•to enhance current power bi dashboards to make the model results avaiable to final stakeholders

•to present results and advancements to stakeholders.

The same project has been applied for dirrent countries in which the client is available (IT, DE, ES, BE) and is in continous expansion being developed for new countries .

The main technologies used where:

•Dataiku

•Python (with datascience libraries: pandas, numpy,...)

•SQL

Delta lake migration, Cortilia

The client owns a dataplatform, hosted on AWS Data lake, used for analysis and for extract data used by an external advertising platform to provide targeted advertising directly to their possible new customers. To do so it uses a huge number of glue jobs, each one applying different logics 
The objective of the project was to design and provide a framework to generalize their glue jobs through a single code base and to migrate the data lake to a delta lake structure.

March 2024 – May 2024

The main technologies used are:

•Python, Pyspark

•AWS S3, Glue, Lambda, SQS

Scorecard project, Mediaset

The client owns a scorecard project, consisting in a data lake hosted on AWS cloud platform. The data lake is composed of different tables in the data lake oriented to measure through a series of KPIs characteristics of its customer base.
The objective of the project is to apply fixes to the code and extend it adding new classes and new tables to enrich the KPIs used by the bunisses unit to analyze the customer base.

January 2024 – March 2024

The main technologies used in the project are:

•AWS Data lake suite: EMR, S3, lambda

•Python,Pyspark

Document Ranking, Reply Holding

In order to facilitate the selection of a certain number of proposals for an event, an artificial intelligence model has been developed with the aim of predicting whether a specific proposal for an event is worthy of being chosen or not. The model has been developed using various Machine Learning techniques capable of working with both numerical and textual data, leveraging modern natural language processing techniques. The model was trained using the AWS Sagemaker cloud platform.

October 2023 – January 2024

Additionally, using both open-source Language Model (LLM) and OpenAI APIs, software was developed to outline the proposal at certain key points to allow for quicker evaluation by the organizing committee.

Applied technologies

•AWS Sagemaker (Notebook, Pipeline, Experiments, Hyperparameter tuning)

•Python, NLTK

•XGBoost framework, skopt package

•Hugging Face, Transformers, OPENAI, LLama2, LLamaCPP, LlamaIndex

Backend reporting, Axa, Data analyst & Data Engineer

Development in Presto-SQL language on AWS Athena, for QLIK frontend, Key Performance Indicators (KPIs). Analysis of business requirements.
Managed client interaction and definition of business requirements.

October 2023 – December 2023

Technologies used:

•SQL (PrestoSQL and SparkSQL)

•AWS Athena

•AWS Glue

•AWS Data Catalog

•Terraform

Customer loyalty automation, YNAP

The customer loyalty automation project consists of automatizing the following two business process:
- automatize the customer loyalty level assinment in a ecommerce platform querying the data platform and applying ETL Jobs to check if they satisfy the business logic

July 2023 – September 2023

- automatize the process of detecting high value new customers.

Both the project sections have been developed in python with pyspark, running on emr cluster. The jobs have been scheduled with airflow.

Goegraphic analysis, YNAP

Using the geo purchasing power dataset created in a previous project, the aim of the project consists in an analysis of how client ecommerce's customers are distributed along the United Kingdom country, with a focus on England, and how different KPIs of a city, like the households mean income and square meter price of houses of a city, is related with the amout spent by customers in a city.
The project lasted one week, and it has been presented in a client's internal conference of the growth strategic unit, held by the CGO.

July 2023

Developed in Python on jupyter notebooks to produce the report.

Geo purchasing power, YNAP - Dataset creation

The Geo purchasing power project aim is to create a table with a kpi representative of the social income power of a certain postal code in three different countries: United Kingdom, United States and Italy. The kpi used are many and different for the 3 different countries. Each kpi is integrated with a description of the kpi granularity and a decile value from 1 to 10 of the quantity.
For the project the followging technologies have been used:

April 2023 – July 2023

- Pyspark

- Hive

- Hue

- Jupyter notebooks

- Airflow

- Jira for Agile organizzation

Tasks:

- Data seek

- Data cleaning and wrangling

- Data analysis

- Product code creation

- Table creation

Machine learning engineer, Axa

The client required the creation of single template repository that can be used for each data science project by development of a docker image.
This image runs on an AWS SageMaker Endpoint, and handles calls to different templates based on the project, abstracting which underlying template and libraries are required.

March 2023 – April 2023

To do this, the docker image consists of running a python script that initializes a first web server thread via FastAPI.

Through an appropriate http request made to that web server, it is possible to specify which libraries a data scientist requires within his project by loading wheel files, so that his model relies on these for execution.

At this point the web server instantiates a second web server thread within the endpoint. Once an http request is made to the inference endpoint in order to make a prediction using the desired model, the first web server thread redirects the request to the second one, which handles the interaction with the model as if it were a normal endpoint.

Activities performed and technologies used:

- introduction to the client's core services

- introduction to example infrastructures in the customer's AWS cloud and how they are connected

- introduction to artifactory

- Python script development and related unit testing

- development and modification of existing Jenkins pipeline

CICD implementation and overview, Stellantis

The aim of the project was to create  continuous integration and continuous development pipelines for different existing projects and their migration to benefit from the usage of this pipeline.
My task included:

February 2023 – 2023

•Develop utilities in python code to implement checks in continuous integration and continuous delivery pipeline

•Designing the pipeline

•CICD pipelines tester.

Main technologies used:

•Azure Devops

•Azure Datafactory (base level)

•Python

•Github

Backend reporting, AXA

The aim of the project was to develop and maintain different QLICK reports used by business analyst to observe key performance indexes indicating insurance products sales and usages, enriching the client data platform.
The main activities were:

October 2022 – March 2023

•Analysis of business requirements

•Development of KPIs new table in Presto-SQL language 

•Query engineering from Presto-SQL to SparkSQL

The main technologies used are:

•SQL, PrestoSQL, SparkSQL

•AWS Athena, AWS Glue

Privacy protection in IoT: A deep reinforcementlearning approach., Master degree thesis

Privacy protection in IoT: A deep reinforcementlearning approach.
Electronic communications are always exposed to privacy risks:  in any interac-tion, based on messages, an endpoint can exploit both the data exchanged andthe metadata in the message to disclose information about the sender.  In theInternet of Things context, a dynamic context where a huge number of devicesexchanges messages, the privacy risk is a crucial and timely issue.  Still in thiscontext, service discovery is the process of finding the services offered by IoTdevices according to clients’ requests.  Many solutions have been proposed forit, but privacy protection is still an important aspect to investigate.  The con-sidered environment consist of a mobile or wearable device, which aims to findand to obtain a certain number of services, which can be offered from differentproviders, with a target deadline.  The device moves along a path where it canencounter the services providers.  The interaction with services providers oftenrequires the exchange of data that can be sensitive for the device owner.  In ad-dition, service providers can collude, combining the data gained from differentproviders, leveranging their value and raising the privacy risk for the user.  Theobjective of this thesis is to develop a solution to improve privacy protectionapplying deep reinforcement learning techniques:  the deep Q learning and ac-tor critic method.  The evaluation of the performance of the agent is examinedthrough ad-hoc defined metrics for the problem.  In addition, the objectives ofthe thesis include the development of the simulator used for the experiments andthe  creation  of a  dataset  of  IoT mobile  service  providers,  available  for  futureresearches.

AWS Super, University project in Master degree

Develop of platform in Cloud for DNA substring computing through an on deman HCP based on docker containers. Developped on AWS.

ARXIV, University project in Master degreee

Analysis of paper category popularity of Arxiv dataset using graph analysis techniques and development of a recommender system based on NLP and clustering techniques.
Developed usign Python (pandas, numpy, scikit-learn), Apache Spark, PySpark, Map reduce, MongoDB.

QUICK, University project in Master degree

Performance analysis of HTTP3 protocol based on QUIC protocol, compared with HTTP1 and HTTP2 with different metrics and under different conditions of instabiity of the network.

Substring parallel, University project in Master degree

Development of parallelized version of a substring matching algorithm: the longest common subsequence algorithm.
The algorithm has been developped in C language, parallelized with open MP and tested on google cloud platform using instances with different settings, for example changing the number of processors and the amount of available memory.

CDN LOGMININ, Bachelor degree thesis

Starting from anonimous log entries coming from a content delivery network server of an entertainment service, a learning algorithm has been developed and applied to cluster which entries belong to the same user.
Developped in Python, with Pandas, Numpy, Matplotlib and Scikit-learn.

NQUEEN, University project in Bachelor degree

Development of a multithread software written in Java language. Given the size of  a chessboard of size NxN, the software computes all the possible solutions of the N queen problem.

GASSIX, University project in Bachelor degree

Development of optimal one step predictor able to predict the house gas consumption given the consumption of the past six days. It was developeed evaluating different models:
polynomial model, multi layer neural network, radial basis neural network.

The model has been developped in matlab.

RIS8, University project in Bachelor degree

Development of web platform to configure computer, manually or in automatized way. Developped in Java using the Jetty framework.

ArduFit & Heartz, Diploma project

Development of ArduFit, hardware device based on Arduino Nano, wearable device which measures the heartbit and body temperature of the user.
Development of Heartz, android application for smartphones which communicates with ArduFit reading the measurements, storing and preprocessing them for future functions.