HELLO, WORLD. I AM

Sreevarshini
Srinivasan.

I build reliable data systems.

I'm fascinated by the mechanics of scale: how do you ingest a terabyte without losing data? What breaks first when your schema changes? How do you keep pipelines reliable when everything upstream is chaotic? I spend a lot of time thinking about these problems, and more time building systems that catch them before they become incidents. I recently finished a master's in data science, and I'm equally interested in the algorithmic side designing models and learning systems that actually work at scale, not just in notebooks. But I've learned that a brilliant model is worthless if the data feeding it is broken. So I design with both layers in mind: infrastructure that's observable and reliable, and models that know how to fail gracefully.

Projects

Systems that connect real data to reliable ML outcomes.

SchemaDrift — Resilient Data Pipeline

Python · PostgreSQL · Watchdog · Pandas · JSON Schema · EDA · SQL

Built a self-documenting data pipeline that detects schema drift in real time, maps the blast radius of mutations across Bronze/Silver/Gold layers, and auto-rescues malformed data into a dynamic PostgreSQL architecture.

Impact: Auto-detects schema mutations in real-time and visualizes downstream impact across 3 data layers

TaxFilingChatBot — AI Assistant

Python · LLM · Docker · RAG · PDF Processing

Containerized AI assistant that grounds responses strictly in official IRS tax documents, helping international students navigate 2025 regulations with verified accuracy (no hallucinations).

Impact: Zero-hallucination tax guidance — 100% responses grounded in official IRS documents

ICU Sepsis Prediction — DyT Transformer

Python · PyTorch · Transformers · Time-Series ML

Built a transformer-based model with dynamic normalization (DyT) that predicts sepsis risk 24+ hours in advance using irregular ICU time-series data, handling missing values and temporal gaps in real-time monitoring.

Impact: Achieves early sepsis prediction with adaptive handling of missing ICU data

BrisT1D Blood Glucose Prediction

Python · Time-Series Forecasting · XGBoost · LightGBM

Data science project forecasting blood glucose 60 minutes ahead for Type 1 diabetes patients using CGM sensors, insulin doses, and activity data—built for early intervention and personalized care.

Impact: 60-minute blood glucose forecasting for T1D patients with real-time personalization

TandC Summarisation — Fine-tuned LLM

Python · Mistral-7B · QLoRA · NLP · Transformers · ROUGE

Fine-tuned Mistral-7B with QLoRA to summarize complex Terms & Conditions into plain English, making legal documents accessible to non-experts.

Impact: ROUGE-based evaluation for summarization quality on legal documents

Where I've Worked

Data Engineer

@ Loyalytics AI

April 2023 – July 2024 · Remote

Built lakehouse-style data infrastructure for retail analytics at scale, with reliability and governance as defaults.

  • Created pipelines in Databricks and set up the whole system with Delta Live Tables — managing 10TB of historical data and batch pipelines updating hourly for 10+ sources and 100+ tables through medallion architecture.
  • Built an internal tax calculation platform on Azure for LuLu, a retail chain operating across 7 countries — processing 6 years of historical data plus ongoing batch ingestion, with 100% accuracy.
  • Ran Apache NiFi on a VM rather than managed cloud specifically to cut compute costs and keep sensitive financial data secure.
  • Implemented complex sales and inventory transformation logic to power downstream analytics and reporting.
  • Migrated ~6TB data from legacy databases (Oracle, MySQL) into Azure ecosystem and authored complex SQL and Python queries for robust data processing.
  • Led migration to Unity Catalog, optimizing data management and governance for a massive retail chain in the Middle East.
Azure Data Factory Databricks Delta Live Tables Apache NiFi PySpark Python SQL Oracle Azure Storage

Data Engineer Intern

@ Loyalytics AI

January 2023 – March 2023 · Hybrid

Foundation work in orchestration, transformations, and governance for retail analytics pipelines.

  • Built ADF pipelines orchestrating ingestion from diverse sources (Oracle, MySQL, XMLs, CSVs) in Azure Storage.
  • Authored complex SQL and Python queries for efficient data processing and transformation.
  • Automated validation processes across large datasets, capturing issues early and ensuring integrity.
  • Led foundational migration efforts to Unity Catalog to improve governance and data discoverability.
  • Improved data flow efficiency by 40% via incremental load strategies and tuning, contributing to streamlined operations.
Azure Data Factory SQL Python Oracle MySQL Unity Catalog

Cloud Infrastructure Engineer (Founding Team)

@ DriverAI, LLC

June 2025 – August 2025 · Peoria, AZ

Built automation-first cloud infrastructure with monitoring, security best practices, and reproducible deployments.

  • Founding engineer responsible for setting up end-to-end ingestion for structured and unstructured data.
  • Deployed AWS (RDS, EC2) and Azure infrastructure ensuring high availability and scalable patterns.
  • Automated resource provisioning with Terraform, enhancing operational efficiency and cutting manual overhead by 60%.
  • Integrated monitoring via CloudWatch and Grafana to track system performance and resolve incidents early.
  • Collaborated on CI/CD pipelines and enforced cloud security best practices through IAM and secrets management.
AWS Azure Terraform Grafana CloudWatch CI/CD

Lab Ambassador

@ Tinkerspace UMD

Present · College Park, MD

Supporting a collaborative robotics lab by helping users move from ideas → safe, repeatable workflows.

  • Support students, faculty, and staff using lab tools and equipment in a hands-on environment.
  • Translate technical steps into clear workflows and documentation for different skill levels.
  • Guide users through safe, repeatable processes that reduce friction and errors.
  • Help maintain shared standards for organization, tooling, and usage procedures.
Robotics Technical Support Documentation

Technologies & Tools

Data Engineering

Expert: Python • SQL • PySpark • Databricks

Familiar: Snowflake • dbt • Airflow • NiFi • PostgreSQL

Cloud & Infra

Expert: AWS (S3, EC2, Lambda) • Azure (ADF, Storage) • Terraform

Familiar: Glue • Athena • CloudWatch • Azure Databricks

Governance & Reliability

Expert: Data Quality • Lineage • Unity Catalog

Familiar: RBAC • Audit Logs • Freshness SLAs • Monitoring

ML / Deep Learning

Expert: PyTorch • Transformers • Time-Series ML • Scikit-learn

Familiar: NLP • LangChain • Gemini • XGBoost • LightGBM

Dev & Ops

Expert: Git • Linux • Bash • CI/CD

Familiar: Docker • IaC • Grafana • GitHub Actions

Dashboards

Expert: Streamlit • Analytics Reporting

Familiar: Grafana • Power BI

The plumbing that makes big data work.

Hi, I'm Sreevarshini. I build infrastructure for data and AI systems.

I'm fascinated by the mechanics of scale: how do you ingest a terabyte without losing data? What breaks first when your schema changes? How do you keep pipelines reliable when everything upstream is chaotic? I spend a lot of time thinking about these problems, and more time building systems that catch them before they become incidents.

I recently finished a master's in data science, and I'm equally interested in the algorithmic side—designing models and learning systems that actually work at scale, not just in notebooks. But I've learned that a brilliant model is worthless if the data feeding it is broken. So I design with both layers in mind: infrastructure that's observable and reliable, and models that know how to fail gracefully.

Education & Learning

M.S. in Data Science

University of Maryland, College Park

2024 – 2026 · College Park, MD

Coursework across machine learning, deep learning, NLP, and large-scale data systems, with a focus on building reliable ML-ready data pipelines.

Let's build something.

Currently open for new opportunities. Whether you have a question or just want to say hi, my inbox is always open.