
As a Machine Learning and Artificial Intelligence Engineer, my responsibilities include the management and maintenance of the codebase and the company's core models. Furthermore, I am responsible for the integration of new functionalities through the implementation of cutting-edge AI models.
Consultant working with different clients in many data science and artifical intelligence projects.
Junior consultant working as Data Scientist and Data Engineer in different projects and with different clients.
Development of web crawler using python and frameworks.
Development and analisys of models in wineinformatics and geo spatial data fields.
Learning and develop in powerbuilder. Learning and usage of SAP HANA. Alpha test execution for internal company software.
Monitoring of company network through the usage of Spiceworks, online network monitoring tool.
Graduated with 110/110 with Laude
Graduated with 104/110
With grade 100/100
Native speaker
Half native speaker
B2
Data analysis
Data Mining
Clustering algorithms
Predictive models
Linear Regression
Logistic Regression
Ridge Regression
Classification algorithms
Decision trees
Ensemble methods
Natural Language
Processing & Text Mining
Apache Hive
Apache Spark
Hadoop MapReduce
PySpark
Comunication
Active listening
Negotiation
Terraform
CDK
ML Specialist
Apache Hive; Apache Spark; GIT - GitHub; Hadoop MapReduce;
Jenkins; Jetty Web Server; Jupyter Notebook, Git, Pycharm, Eclipse IDE, Android Studio;
MATLAB & Simulink
Sagemaker, SageMaker Studio, SageMakerEndpoints, ECS, ECR, Lambda, Step functions, Data Pipeline, EMR, AWS Batch, DMS, RDS, Redshift, DynamoDB, S3, EC2, Kinesys (Strem, Analytics, Firehose), Glue (Data Catalog, ETL, Athena), QuickSight, Bedrock
Bigquery
Google cloud storage
Vertex AI
MySQL, PrestoSQL, SparkSQL, MongoDB
Keras
Tensorflow
CNN, RNNs and LSTMs networks
- •Langchain framework
- •Few shot learning
- •Chain of thoughts
- •Agentic AI
Android
Bash scritping
C
Dart, Flutter
HTML, CSS, Bootstrap
Java EE 7
Javascript, Javascript/Ajax
Labview
Matlab
OpenMP
PHP
PySpark
Python (scikit-learn, pandas, numpy, maptplotlib)
SQL
R
Design patterns
Clean code principles follower
- •GitHub Actions mid level;
- •Jenkins basic level
- •AWS Code deploy & Code pipelines
- •Docker and docker compose
As a backend, Machine Learning, and Artificial Intelligence Engineer, I'm working on the company main product: a web platform for enriching personal data (such as emails, phone numbers, first names, and last names).
My goal is to process and interpret this data to create digital scoring systems, which help client detect fraudolent users and not fraudolent users.
To enhance the service level of products offered by shops for their customer and to provide an automatic solution, the aim of the project is to build a model able to estimate the target stock for each product for each shop of the client.
To do so, the project has been divided in two steps:
The same project has been applied for dirrent countries in which the client is available (IT, DE, ES, BE) and is in continous expansion being developed for new countries .
As a Data scientist my principal tasks were:
The main technologies used where:
In the context of a promotional campaign, the client has the need to develop an automated flow detecting if a client benefits of campaign and detect the number of discounts the customer is a beneficiary.
As a Data Scientist, my principal tasks where:
The main technologies used were:
To reduce costs associated to buy industrial components to produce its products, the client required a demand supply forecasting system.
As a Data scientist my principal tasks were:
The same project has been applied for dirrent countries in which the client is available (IT, DE, ES, BE) and is in continous expansion being developed for new countries .
The main technologies used where:
The client owns a dataplatform, hosted on AWS Data lake, used for analysis and for extract data used by an external advertising platform to provide targeted advertising directly to their possible new customers. To do so it uses a huge number of glue jobs, each one applying different logics
The objective of the project was to design and provide a framework to generalize their glue jobs through a single code base and to migrate the data lake to a delta lake structure.
The main technologies used are:
The client owns a scorecard project, consisting in a data lake hosted on AWS cloud platform. The data lake is composed of different tables in the data lake oriented to measure through a series of KPIs characteristics of its customer base.
The objective of the project is to apply fixes to the code and extend it adding new classes and new tables to enrich the KPIs used by the bunisses unit to analyze the customer base.
The main technologies used in the project are:
In order to facilitate the selection of a certain number of proposals for an event, an artificial intelligence model has been developed with the aim of predicting whether a specific proposal for an event is worthy of being chosen or not. The model has been developed using various Machine Learning techniques capable of working with both numerical and textual data, leveraging modern natural language processing techniques. The model was trained using the AWS Sagemaker cloud platform.
Additionally, using both open-source Language Model (LLM) and OpenAI APIs, software was developed to outline the proposal at certain key points to allow for quicker evaluation by the organizing committee.
Applied technologies
Development in Presto-SQL language on AWS Athena, for QLIK frontend, Key Performance Indicators (KPIs). Analysis of business requirements.
Managed client interaction and definition of business requirements.
Technologies used:
The customer loyalty automation project consists of automatizing the following two business process:
- automatize the customer loyalty level assinment in a ecommerce platform querying the data platform and applying ETL Jobs to check if they satisfy the business logic
- automatize the process of detecting high value new customers.
Both the project sections have been developed in python with pyspark, running on emr cluster. The jobs have been scheduled with airflow.
Using the geo purchasing power dataset created in a previous project, the aim of the project consists in an analysis of how client ecommerce's customers are distributed along the United Kingdom country, with a focus on England, and how different KPIs of a city, like the households mean income and square meter price of houses of a city, is related with the amout spent by customers in a city.
The project lasted one week, and it has been presented in a client's internal conference of the growth strategic unit, held by the CGO.
Developed in Python on jupyter notebooks to produce the report.
The Geo purchasing power project aim is to create a table with a kpi representative of the social income power of a certain postal code in three different countries: United Kingdom, United States and Italy. The kpi used are many and different for the 3 different countries. Each kpi is integrated with a description of the kpi granularity and a decile value from 1 to 10 of the quantity.
For the project the followging technologies have been used:
- Pyspark
- Hive
- Hue
- Jupyter notebooks
- Airflow
- Jira for Agile organizzation
Tasks:
- Data seek
- Data cleaning and wrangling
- Data analysis
- Product code creation
- Table creation
The client required the creation of single template repository that can be used for each data science project by development of a docker image.
This image runs on an AWS SageMaker Endpoint, and handles calls to different templates based on the project, abstracting which underlying template and libraries are required.
To do this, the docker image consists of running a python script that initializes a first web server thread via FastAPI.
Through an appropriate http request made to that web server, it is possible to specify which libraries a data scientist requires within his project by loading wheel files, so that his model relies on these for execution.
At this point the web server instantiates a second web server thread within the endpoint. Once an http request is made to the inference endpoint in order to make a prediction using the desired model, the first web server thread redirects the request to the second one, which handles the interaction with the model as if it were a normal endpoint.
Activities performed and technologies used:
- introduction to the client's core services
- introduction to example infrastructures in the customer's AWS cloud and how they are connected
- introduction to artifactory
- Python script development and related unit testing
- development and modification of existing Jenkins pipeline
The aim of the project was to create continuous integration and continuous development pipelines for different existing projects and their migration to benefit from the usage of this pipeline.
My task included:
Main technologies used:
The aim of the project was to develop and maintain different QLICK reports used by business analyst to observe key performance indexes indicating insurance products sales and usages, enriching the client data platform.
The main activities were:
The main technologies used are:
Privacy protection in IoT: A deep reinforcementlearning approach.
Electronic communications are always exposed to privacy risks: in any interac-tion, based on messages, an endpoint can exploit both the data exchanged andthe metadata in the message to disclose information about the sender. In theInternet of Things context, a dynamic context where a huge number of devicesexchanges messages, the privacy risk is a crucial and timely issue. Still in thiscontext, service discovery is the process of finding the services offered by IoTdevices according to clients’ requests. Many solutions have been proposed forit, but privacy protection is still an important aspect to investigate. The con-sidered environment consist of a mobile or wearable device, which aims to findand to obtain a certain number of services, which can be offered from differentproviders, with a target deadline. The device moves along a path where it canencounter the services providers. The interaction with services providers oftenrequires the exchange of data that can be sensitive for the device owner. In ad-dition, service providers can collude, combining the data gained from differentproviders, leveranging their value and raising the privacy risk for the user. Theobjective of this thesis is to develop a solution to improve privacy protectionapplying deep reinforcement learning techniques: the deep Q learning and ac-tor critic method. The evaluation of the performance of the agent is examinedthrough ad-hoc defined metrics for the problem. In addition, the objectives ofthe thesis include the development of the simulator used for the experiments andthe creation of a dataset of IoT mobile service providers, available for futureresearches.
Develop of platform in Cloud for DNA substring computing through an on deman HCP based on docker containers. Developped on AWS.
Analysis of paper category popularity of Arxiv dataset using graph analysis techniques and development of a recommender system based on NLP and clustering techniques.
Developed usign Python (pandas, numpy, scikit-learn), Apache Spark, PySpark, Map reduce, MongoDB.
Performance analysis of HTTP3 protocol based on QUIC protocol, compared with HTTP1 and HTTP2 with different metrics and under different conditions of instabiity of the network.
Development of parallelized version of a substring matching algorithm: the longest common subsequence algorithm.
The algorithm has been developped in C language, parallelized with open MP and tested on google cloud platform using instances with different settings, for example changing the number of processors and the amount of available memory.
Starting from anonimous log entries coming from a content delivery network server of an entertainment service, a learning algorithm has been developed and applied to cluster which entries belong to the same user.
Developped in Python, with Pandas, Numpy, Matplotlib and Scikit-learn.
Development of a multithread software written in Java language. Given the size of a chessboard of size NxN, the software computes all the possible solutions of the N queen problem.
Development of optimal one step predictor able to predict the house gas consumption given the consumption of the past six days. It was developeed evaluating different models:
polynomial model, multi layer neural network, radial basis neural network.
The model has been developped in matlab.
Development of web platform to configure computer, manually or in automatized way. Developped in Java using the Jetty framework.
Development of ArduFit, hardware device based on Arduino Nano, wearable device which measures the heartbit and body temperature of the user.
Development of Heartz, android application for smartphones which communicates with ArduFit reading the measurements, storing and preprocessing them for future functions.