Jon Roosevelt – Online Resume

Jon Roosevelt
Email
rooseveltadvisors​@gmail.com Phone
(917) 891-9082 Location
Short Hills, NJ 
jonroosevelt.com 

Professional Summary

Solution Architect and AI/Data Engineering leader with 18 years of experience designing and deploying enterprise-scale data, ML, and GenAI solutions. Deep expertise in large language models (LLMs), distributed data engineering, and cloud-native architectures for healthcare, media, and AdTech. Proven record of delivering measurable business impact and regulatory-compliant solutions for Fortune 500 clients and startups. Specialized in Python, Apache Spark, Databricks, AWS, Azure, and healthcare data interoperability.

Professional Experience

Cofounder & AI Engineer, Arcs Health

•Led the creation of Scriber, a real-time AI system for transcribing clinical conversations using OpenAI Whisper and Llama 3.2 70B, reducing clinician note-taking time by 60% and ensuring HIPAA compliance.
•Built scalable, multi-format healthcare data pipelines (CCDA, HL7 v2/​v3, EDI) with Azure Data Factory, Databricks, and Fabric, standardizing to FHIR for analytics and interoperability.

01/2023 – PresentNew York, NY

•Designed and implemented a RAG-enabled patient engagement chatbot (Twilio, Llama 3.2 70B, LangChain Agentic), automating appointment scheduling and triage for 5,000+ patients/​month.

•Developed digitized intake and document upload workflows, streamlining patient registration and insurance verification.

•Architected secure, regulatory-compliant data workflows, integrating Azure AD authentication and API Management for privacy and access control.

Sr. Data Engineer, NBC News

•Automated CCPA/​GDPR Data Subject Request (DSR) processing using Airflow and Databricks Delta Lake, reducing manual workload by 85% and ensuring regulatory compliance.
•Optimized Spark streaming jobs for MSNBC/​CNBC, resolving data skew and DAG lineage issues, improving real-time processing efficiency by 35%.

09/2024 – 01/2025Englewood Cliffs, NJ

•Developed real-time BI dashboards (Looker, AWS Kinesis, Kafka, EMR) for subscription and viewership analytics, reducing latency to 5 minutes.

•Led Terraform-based CI/​CD deployments and orchestrated complex data workflows using Airflow.

Staff Software Engineer, Preveta

•Architected HIPAA-compliant data pipelines for major EHR/​EMR integrations using Python, Databricks, and Azure Synapse.
•Designed secure ingestion platforms for bi-directional data flows, supporting PowerBI analytics and regulatory reporting.

03/2023 – 02/2024Los Angeles, CA

•Developed robust Python libraries for HL7, C-CDA, and EDI standardization, enabling rapid onboarding of healthcare partners.

Senior Solution Architect, The Trade Desk

•Designed and implemented custom ML models and CI/​CD pipelines (Python, Spark) for ad campaign optimization, delivering up to 35% ROI improvement for clients such as McDonald's and Bayer.
•Led multi-million-dollar data science projects, including custom bidding algorithms and supply path optimization, reducing cost per viewable impression by 12%.

11/2019 – 10/2022New York, NY

•Built real-time analytics solutions integrating DSP, ad server, and audience data for programmatic marketing automation.

Founding Director, Intellinum Analytics Inc

•Led AI/​data engineering projects for AdTech, healthcare, and retail clients, from ideation to delivery.
•Optimized Spark jobs on Kubernetes, reducing operational costs by 70–80% and accelerating development cycles.

02/2017 – 01/2023New York, NY

•Designed and deployed ensemble ML models for campaign optimization, achieving up to 112% test campaign performance uplift.

Research Staff, IBM Research

•Developed city analytics and customer segmentation solutions using Python ETL, Spark, and IBM Big Insights.
•Built predictive models for healthcare and media, delivering actionable insights from large-scale, multi-source data.

2015 – 2018Yorktown Heights, NY

•Led real-time analytics platform development for telecommunications, supporting campaigns for 35M+ users.

Solutions Architect, EMC

•Provided backend engineering and infrastructure consulting for clients including Monsanto, specializing in VCE converged infrastructure and Python-based automation.

2014 – 2015New York, NY

Education

M.S. in Computer Science, New York University

2012 – 2014NY

B.S. in Computer Science, Donghua University

2005 – 2009China

Technical Skills

LanguagesPython
Scala
Java
SQL
JavaScript
Go
Rust
R
C#
CloudAWS (EMR, Lambda, S3, SageMaker, EC2)
Azure (Data Factory, Synapse, OpenAI, Kubernetes, APIM)
GCP (BigQuery, Dataflow, Vertex AI)
Healthcare DataFHIR
HL7
C-CDA
EDI
SecurityAzure AD
Azure API Management
OAuth2
FrameworksPyTorch
Spark
Scikit-learn
Transformers
TensorFlow
Semantic Kernel
FastAPI
DataDatabricks
Delta Lake
Hive
Redshift
Snowflake
Kafka
Airflow
dbt
DevOpsDocker
Kubernetes
Terraform
Linux
CI/​CD

Recent Projects

Enterprise Agentic Automations Platform, 2025

•Architected an enterprise-ready AI research agent by porting an open-source solution to Microsoft's Semantic Kernel, enabling modular LLM agent orchestration and plugin-based skill ecosystem.
•Integrated Azure AD authentication and custom API Management (APIM) for secure, role-based access and rate-limiting, supporting 10,000+ research queries per day.

•Implemented the MCP Server/​Client using Python MCP SDK to support calling various tools.

•Implemented memory/​context management for LLMs, optimizing research workflows and context window usage.

•Containerized the platform for flexible deployment across Azure Kubernetes Service (AKS) and hybrid environments.

•Developed specialized reporting modules for insurance, safety, and financial research, supporting custom compliance workflows.

•Lessons learned: MCP via SSE, LLM context management, authentication flow, and streaming architecture patterns.

AI Evaluation & Governance Framework, 2025

•Built a custom evaluation system for LLM agents using Azure AI Evaluation SDK, tailored for enterprise deployments behind Azure API Management (APIM) with custom authentication and header requirements.
•Developed APIM-aware evaluators for metrics including groundedness, factual accuracy, relevance, contextual precision, faithfulness, and fluency, supporting JSON-based scoring and reasoning.

•Implemented a manual evaluation loop for robust aggregation and reporting, overcoming SDK limitations with APIM endpoints.

•Integrated caching and error handling for stability and cost efficiency; enabled detailed pass/​fail analytics and compliance reporting.

•Provided actionable insights to improve prompt engineering, retrieval strategies, and system reliability.

GPU-Accelerated RAG Pipelines, 2025

•Designed and delivered a GPU-accelerated Retrieval-Augmented Generation (RAG) system for enterprise document intelligence, achieving 5x speedup over CPU-based processing.
•Implemented multi-GPU orchestration for parallel document parsing, embedding generation, and chunking, scaling to 50+ pages/​sec.

•Developed memory-aware batching and dynamic chunking strategies to optimize GPU utilization and prevent out-of-memory errors.

•Built robust checkpointing and verification systems to ensure data integrity and enable seamless failover to CPU as needed.

•Containerized the pipeline for reproducible deployment; integrated monitoring for GPU utilization and performance metrics.

•Realized dramatic reductions in processing time and cost per document, enabling real-time analytics on petabyte-scale document collections.

Publications

ICCCS 2018

Ding X., Yan C., Zhao Y., Yang Z. (2018). Efficient Processing of TopK Dominating Queries on Incomplete Data Using MapReduce. *ICCCS 2018*, Cloud Computing and Security, pp. 78–89.

12/01/2017

Certifications

Databricks Certified Machine Learning Professionalhttps://credentials.databricks.com/aa12012c-d1ae-4195-a99c-2b495d99ffa2#acc.sZoq8zwp⁠
AWS Certified AI Practitionerhttps://www.credly.com/badges/1351a19d-0020-43f3-8fa0-16d8583bceb0/public_url⁠
Databricks Certified Data Engineer Professionalhttps://credentials.databricks.com/6ab4e7e2-163a-43ad-ab2f-bee48999a90f#acc.IY9Zo2Ci⁠
Microsoft Certified: Azure AI Fundamentalshttps://learn.microsoft.com/en-us/users/jonroosevelt/transcript/dlozriqzx8g9w4m⁠
AWS Certified Machine Learning – Specialtyhttps://www.credly.com/badges/33bd47b0-5301-47b4-b91d-68b45275e627/public_url⁠
Microsoft Certified: Azure AI Engineer Associatehttps://learn.microsoft.com/en-us/users/jonroosevelt/transcript/dlozriqzx8g9w4m⁠

Core Competencies

GenAI & LLMs

RAG, Llama Index, LangChain/​LangGraph/​LangSmith, LLM fine-tuning, evaluation/​governance, agentic frameworks, MCP, prompt engineering, vector databases, agentic memory

Data Engineering

Spark, Databricks, Delta Lake, Airflow, Kafka, Hive, Redshift, Snowflake, AWS EMR, Azure Data Factory, dbt, medallion architecture, structured streaming

Cloud Platforms

AWS (EMR, Lambda, S3, SageMaker, EC2)
Azure (Data Factory, Synapse, OpenAI, Kubernetes, APIM)

GCP (BigQuery, Dataflow, Vertex AI)

Healthcare Data

FHIR, HL7 (v2/​v3), C-CDA, EDI, HIPAA/​GDPR/​CCPA compliance, EHR/​EMR integration, data normalization, clinical NLP

Machine Learning

PyTorch, Scikit-learn, Transformers, TensorFlow, ML pipelines, CI/​CD, model evaluation, experiment tracking, MLflow

DevOps & Security

Docker, Kubernetes, Terraform, Linux, CI/​CD automation, Azure AD, API Management, secure API design, identity integration