About Me

I am an ML Engineer at GrantX, where I build production-scale AI infrastructure for grant discovery. With a recently completed M.S. in Data Science from Northeastern University (3.867 GPA) and a Mechanical Engineering background, I bring a unique blend of technical depth and interdisciplinary problem-solving to complex AI systems.

My experience spans data engineering, machine learning, and distributed systems. At GrantX, I'm architecting GRASP, a multi-layered AI platform combining hybrid search engines (vector embeddings, semantic search, LLM-based intelligence) with chain-of-thought reasoning and evidence-based verification, processing 25,000+ grants with 91ms latency. I built a complete data ingestion pipeline for 756,000+ IRS 990-PF filings and optimized enrichment workflows by 4-6x through strategic database batching. Previously, I developed advanced ML models for semiconductor manufacturing at Veeco and conducted impactful research on gun violence at NYU's Dynamical Systems Laboratory, securing $2.1 million in funding and publishing in Nature Human Behaviour.

Proficient in Python, FastAPI, Elasticsearch, GCP/Kubernetes, PyTorch, and distributed systems (Scala, Hadoop, Spark), I specialize in production ML systems, RAG pipelines, vector search, anomaly detection, and GenAI integration. My work combines engineering rigor with cutting-edge AI to build scalable, reliable systems that solve real-world problems in both the public and private sectors.

I thrive at the intersection of machine learning, software engineering, and infrastructure—transforming complex technical challenges into elegant, production-ready solutions that deliver measurable business impact.

Datum ad vitam Data for life
Skills

Technical Expertise

As an interdisciplinary data scientist, I leverage expertise in machine learning, statistical analysis, and programming to extract meaningful insights from complex datasets and develop innovative solutions.

Data Science & Machine Learning

Machine Learning Algorithms

Deep Learning & Neural Networks

Statistical Modeling & Analysis

Time Series Analysis

Pattern Recognition

Causal Analysis

Distributed Machine Learning

Graph Algorithms for Big Data

Programming Languages & Tools

Python (NumPy, Pandas, Scikit-learn, OpenCV)

R (tidyverse, ggplot2, dplyr, caret)

HTML

CSS

MATLAB

C++

SQL (MySQL) & NoSQL (MongoDB, HBase, Hive)

Scala

Git

Mathematica

Docker

FastAPI & Async Python

Machine Learning Frameworks

PyTorch

Scikit-learn

TensorFlow & Keras

FastAI

Vertex AI & Gemini API

Big Data & Cloud Technologies

GCP (Kubernetes, Vertex AI, Cloud Storage)

AWS (EC2, S3, SageMaker)

Elasticsearch & Vector Search

Hadoop Ecosystem

Apache Spark

Distributed Systems & CAP Theorem

Data Visualization & Analysis Tools

MS Excel (Advanced)

Tableau

Professional Skills

Problem Solving & Critical Thinking

Communication & Presentation

Teamwork & Collaboration

Project Management

Adaptability & Continuous Learning

Research & Analysis

Grant Writing & Research Funding

Scientific Writing & Academic Publishing

What I Do

Data Science, Machine Learning, and Analytics

Machine Learning

Predictive Modeling for Business Insights

Deep Learning for Complex Pattern Recognition

Custom Algorithm Development

Data Analysis

Statistical Modeling for Decision Making

Advanced Data Mining Techniques

Causal Analysis for Impact Assessment

Data Engineering

Interactive Data Visualization

Big Data Processing and Analytics

Efficient Database Design and Management

Timeline

My journey so far

From mechanical engineering to data science, explore the key milestones and experiences that have shaped my career path.

EDUCATION

EXPERIENCE

GrantX
ML Engineer, GrantX

June 2025 - Present

Architecting production-scale AI systems for federal and private grant discovery, leveraging hybrid search and GenAI.

At GrantX, I'm building the core AI infrastructure powering grant matching for thousands of organizations. Currently architecting GRASP (Grant Retrieval & Assessment System), a next-generation multi-layered intelligence platform that combines hybrid retrieval methods with chain-of-thought reasoning, multi-dimensional scoring, and evidence-based verification to ensure precise grant-opportunity matching. I architected and deployed a hybrid search engine combining semantic search, vector embeddings (Elasticsearch ELSER + Dense Vectors), and BM25 retrieval with intelligent LLM-based query generation using Gemini models. The system processes 25,000+ federal and private grant opportunities with 91ms p50 latency and 99.9% uptime on GCP/Kubernetes. Beyond federal grants, I built a complete data ingestion pipeline that processes IRS Form 990-PF filings from 756,000+ private foundations—parsing XML tax returns to extract organizational metadata, grant histories, financials, and identify funding patterns and anomalies across the philanthropic landscape, then bulk-upserting this data into our funders database using cursor-based pagination and retry logic to handle large-scale operations reliably. I've also built conversational AI interfaces including the "Talk to NOFO" chatbot, enabling users to query complex funding documents through natural language. My work spans the full stack from FastAPI async endpoints to Kubernetes deployment configurations with sophisticated health probes and resource management, managing both real-time search (sub-100ms) and background enrichment workflows processing 50+ grants concurrently. This role combines my expertise in distributed systems, machine learning, and production engineering to solve real-world problems in the public sector funding space.

Northeastern University Khoury College
Graduate Teaching Assistant - CS5800 Algorithms, Khoury College of Computer Sciences

June 2024 - May 2025; September 2023 - December 2023

Facilitated graduate-level algorithms education through teaching, grading, and mentoring.

As a Graduate Teaching Assistant for CS5800 Algorithms at Northeastern University's Khoury College of Computer Sciences, I played a vital role in enhancing the learning experience of graduate students. Working alongside Prof. Virgil Pavlu and a team of graduate and PhD TAs, I contributed to various aspects of this advanced course. My responsibilities included grading assignments and exams with meticulous attention to detail, providing constructive feedback to foster student growth, and conducting weekly office hours for personalized support. I excelled in clarifying complex algorithmic concepts, from fundamental data structures to advanced topics like Dynamic Programming, Graph Algorithms, and NP-complete problems. This role deepened my expertise in algorithms, C++, and Python while honing my skills in problem-solving and technical communication. This experience as a TA was mutually beneficial, allowing me to reinforce my knowledge while helping others grasp complex computational concepts.

Veeco
Engineering Data Scientist (Co-op), Veeco

January 2024 - June 2024

Applied advanced machine learning techniques to semiconductor manufacturing processes.

During my 6-month co-op at Veeco in San Jose, California, I immersed myself in the semiconductor industry, applying data science to real-world manufacturing challenges. I developed advanced machine learning models using TensorFlow and PyTorch to predict boron wafer resistance, achieving a remarkable 6% average error rate. To overcome limited data challenges, I implemented innovative techniques like data augmentation and group normalization, while experimenting with CNN architectures. I also enhanced data quality by improving Python scripts for systematic data extraction and structuring. A key project involved developing visualization code for channel data from manufacturing tools, enabling better pattern recognition and decision-making. Through interactions with industry experts, I gained deep domain knowledge in semiconductor manufacturing. My contributions extended to resolving data saving issues by revising Work Instructions, significantly improving data reliability. This experience honed my skills in process optimization, data mining, machine learning, and technical communication, while providing valuable insights into the semiconductor industry.

Northeastern University
M.S. in Data Science, Northeastern University

January 2023 - May 2025

Graduated with a 3.867 out of 4.00 GPA, specializing in machine learning and distributed systems.

At Northeastern University's Khoury College of Computer Sciences, I built upon my engineering background to specialize in Data Science, graduating with a 3.867 GPA. My advanced coursework covered crucial areas including Programming for Data Science, Data Management and Processing, Algorithms, Supervised Machine Learning and Learning Theory, and Database Management Systems. I also completed specialized courses in Distributed Systems, where I gained hands-on experience with Scala, Hadoop, and Apache Spark for large-scale data processing, and a dedicated Large Language Models (LLMs) course exploring cutting-edge generative AI techniques. My capstone project, FraudFusion, showcased advanced machine learning capabilities by developing diffusion models to generate synthetic credit card fraud data, addressing the challenge of extreme data imbalance (0.5% fraud rate). Through specialized feature engineering and custom loss functions, I improved fraud detection performance of XGBoost classifiers from 82% to approximately 90%, demonstrating practical impact in anomaly detection. This rigorous program significantly enhanced my proficiency in Python, R, SQL, and machine learning frameworks like PyTorch and Scikit-learn, equipping me with advanced analytical skills and theoretical knowledge for tackling complex data challenges. The experience bridged my mechanical engineering foundation with cutting-edge data science methodologies, preparing me for impactful roles in AI/ML engineering and data science.

NYU Dynamical Systems Laboratory
Researcher, NYU Dynamical Systems Laboratory

September 2018 - September 2022

Transitioned to data-driven research, securing $2.1M in funding and publishing in prestigious journals.

At NYU's Dynamical Systems Lab, I made significant contributions across various projects:

  • Developed and crafted two successful grant proposals, securing $2.1 million in research funding.
  • Spearheaded the development of machine learning models to predict ICU patient mortality rates, achieving 90% accuracy and contributing to improved patient care strategies in healthcare analytics.
  • Leveraged the MIMIC IV dataset, containing over 300 million clinical observations, to develop and validate predictive models, demonstrating proficiency in handling and analyzing large-scale, complex healthcare data.
  • Applied data analysis and information theory to study causal relationships between gun prevalence, mass shootings, and media, leading to a publication in Nature Human Behaviour.
  • Created information theory-based models of zebrafish behavior, published in Flow: Applications of Fluid Mechanics.

This experience solidified my expertise in machine learning, causal inference, and data analysis, while honing my skills in research writing, mentorship, and interdisciplinary collaboration.

LostBytes
Mechanical Design Engineer, Lost-Bytes

November 2017 - January 2018

Engineered AI-enabled Digester prototype and developed food waste tracking software.

At Lost-Bytes, I bridged mechanical engineering with data science, developing an AI-enabled Digester prototype that converts food waste into organic fertilizer. This project introduced me to practical applications of machine learning in environmental solutions. I also created a data-driven food waste tracking software, analyzing nutritional components, temperature, and pH levels. This experience was pivotal in my transition to data science, showing me how data analysis can drive sustainability and efficiency in real-world systems.

NYU Tandon School of Engineering
M.S. in Mechanical Engineering, NYU Tandon School of Engineering

September 2016 - May 2018

Master's Degree in Mechanical Engineering

My master's at NYU was a turning point, introducing me to the power of data in engineering. Courses in robot perception and simulation tools exposed me to machine learning and data analysis techniques. This experience laid the groundwork for my transition from traditional mechanical engineering to data-driven approaches, sparking my interest in pursuing data science as a career.

WindAid Institute
Rural Electrification Project Engineer/Volunteer, WindAid Institute

July 2015 - September 2015

Applied engineering skills to sustainable energy solutions in rural Peru.

My time at WindAid Institute was transformative. By manufacturing and installing wind turbines for off-grid households, I witnessed firsthand the impact of sustainable technology. This experience not only honed my engineering skills but also opened my eyes to the potential of data-driven solutions in addressing global challenges. It inspired me to explore how data science could be leveraged to optimize renewable energy systems and improve lives on a larger scale.

Cardiff Racing
Cardiff Racing

December 2013 - December 2014

Contributed to the development of CR10 'Louisa', Cardiff Racing's innovative Formula Student car.

Key innovations included a 3D printed nose cone, improved plenum design, and the introduction of flax composites. Our efforts led to an impressive 8th place finish at Formula Student UK, including 6th in the grueling 22km endurance event.

Cardiff University
BEng. Mechanical Engineering, Cardiff University

September 2013 - May 2016

Bachelor's Degree in Mechanical Engineering

My journey at Cardiff University laid the foundation for my technical career. Courses in object-oriented programming, engineering computing, and robotics introduced me to programming and data processing. This early exposure to computational methods in engineering sparked my curiosity about the intersection of mechanical engineering and computer science, ultimately leading to my current pursuit of a career in data science.

Research

Publications and Contributions

Explore my research work, including peer-reviewed publications and significant contributions to the field of data science and engineering.

<
>
Media coverage and firearm acquisition

Media coverage and firearm acquisition in the aftermath of a mass shooting

Media coverage and firearm acquisition in the aftermath of a mass shooting

As a co-author, I contributed to this groundbreaking study published in Nature Human Behaviour, investigating the complex relationship between mass shootings, media coverage, and firearm acquisition.

This research adopts an information-theoretic framework to analyze the interplay between mass shooting occurrences, media coverage on firearm control policies, and firearm acquisition at both national and state levels. Using time series data from 1999 to 2017, we identified a correlation between mass shootings and increased firearm acquisition rates.

Key findings:

  • A transfer entropy analysis revealed media coverage on firearm control policies as a potential causal link between mass shootings and increased firearm acquisition.
  • Our results suggest that media coverage may increase public concern about stricter firearm control, potentially driving increases in firearm prevalence.
  • The study provides insights into the complex dynamics of public response to mass shootings and the role of media in shaping firearm acquisition behaviors.

This work contributes to the understanding of firearm-related behaviors and policies, offering valuable insights for policymakers and researchers in the field of public safety and gun violence prevention.

Code and Data Availability: The code and data used for this study are available in our GitHub repository. This includes Mathematica notebooks for conditional transfer entropy analysis, mutual information calculations, and the mathematical model used in the study.

Firearm Ecosystem

Understanding and Engineering the Ecosystem of Firearms

Understanding and Engineering the Ecosystem of Firearms: Prevalence, Safety, and Firearm-Related Harms

As a researcher at NYU's Dynamical Systems Lab, I co-authored this successful NSF grant proposal, securing $2.1 million in funding for this groundbreaking project.

This LEAP-HI (Leading Engineering for America's Prosperity, Health, and Infrastructure) award supports fundamental research to extend engineering methods for understanding the complex "firearm ecosystem" in the US.

Firearm-related harms are a serious public health issue in the US, where the number of firearm-related deaths has surpassed that of motor vehicle-related deaths. Our project seeks to gain a fundamental understanding of the American "firearm ecosystem", a complex system where firearm prevalence, legislation, media coverage, socioeconomic factors, the political state of affairs, and social phenomena are intertwined.

The project investigates the firearm ecosystem on three scales:

  1. Macroscale: Studying cause-and-effect relationships between firearm prevalence and firearm-related harms on a national level.
  2. Mesoscale: Exploring the ideological, economic, and political landscape underlying policy on a state level.
  3. Microscale: Elucidating individual opinions about firearm safety.

Our research leverages advancements in information and network theories, data science methodologies, and hypothesis-driven experiments to provide insights into the causal roles of potentially contributing factors and inform policy-makers regarding effective interventions.

Zebrafish Swimming Patterns

Emergence of in-line swimming patterns in zebrafish pairs

Emergence of in-line swimming patterns in zebrafish pairs

As a researcher, I contributed to this study published in Flow, examining collective behavior in zebrafish pairs.

This research establishes a mathematical model to investigate the collective behavior of zebrafish pairs, accounting for both social and hydrodynamic interactions between individuals. Our model successfully predicts the preference of zebrafish to swim in-line, with one fish leading and the other trailing.

Key findings:

  • We demonstrated the local stability of in-line swimming through analytical methods.
  • Hydrodynamic interactions were found to play a role in creating a repulsion zone between animals swimming in-line.
  • The study provides insights into the complex interplay between social and hydrodynamic factors in shaping fish swimming patterns.

This work contributes to our understanding of fish collective behavior and the underlying mechanisms that drive formation swimming in aquatic animals.

ICU Mortality Prediction

Machine Learning for ICU Mortality Prediction

Machine Learning for ICU Mortality Prediction

As a researcher at NYU's Dynamical Systems Lab, I led a pilot study using machine learning to predict ICU mortality rates, achieving 90% accuracy.

This project leveraged the MIMIC-IV and MIMIC-III databases to develop predictive models for ICU mortality, aiming to enhance risk assessment and support clinical decision-making in critical care.

Key aspects of the study:

  • Utilized MIMIC-IV dataset, including vital signs, lab results, and patient demographics
  • Employed FIDDLE (Flexible Data-Driven Pipeline) for automated feature extraction and preprocessing of electronic health records
  • Developed and compared multiple machine learning models, including random forests, gradient boosting, and neural networks
  • Implemented feature selection to identify key predictors of ICU mortality
  • Achieved 90% accuracy, demonstrating the significant potential of ML in critical care

Significant outcomes:

  • Identified crucial predictors of ICU mortality from vital signs and lab values
  • Developed a prototype for real-time risk assessment in ICU settings
  • Established groundwork for integrating ML into clinical decision support systems

This pilot study laid the foundation for further research in predictive analytics for critical care, showcasing the potential of data-driven approaches to improve patient outcomes in ICUs.

MOXXI 2019 Conference

MOXXI 2019: Innovation in Hydrometry Conference

2019 MOXXI, CandHy, WMO HydroHub, and CUAHSI Joint Conference

As a member of the organizing committee, I contributed to the planning and execution of this international conference on innovation in hydrometry.

The MOXXI 2019 International Conference was hosted at New York University from March 11th to 13th, 2019. It brought together researchers, users, and instrumentation developers to discuss overcoming barriers to the advancement of hydrological observations and the operationalization of innovative hydrometric technologies.

Key aspects of the conference:

  • Focused on measurement techniques, sensor development, and new data sources in hydrology
  • Explored operational use of new approaches in hydrological observation networks
  • Discussed improvements in data management and quality assessment
  • Examined the potential of citizen science and crowdsourced data in hydrology

This conference exemplified the interdisciplinary nature of modern hydrology, combining elements of data science, environmental monitoring, and citizen engagement.

Projects

Personal and Academic Projects

Explore a selection of my data science projects, showcasing practical applications of machine learning and data analysis techniques.

<
>

Video source: NASA Scientific Visualization Studio

Exploring the Environmental Impact: Can staying at home enhance Air Quality?

Exploring the Environmental Impact: Can staying at home enhance Air Quality?

A data-driven analysis of the relationship between human mobility and air quality during the COVID-19 pandemic.

This project investigates the impact of COVID-19 lockdown measures on air quality in Massachusetts, focusing on the relationship between the Air Quality Index (AQI), Social Distancing Index (SDI), COVID-19 cases, and energy demand.

Key aspects of the project:

  • Data collection and preprocessing of air quality and mobility datasets
  • Exploratory data analysis to identify trends and patterns
  • Statistical analysis to quantify the relationship between mobility and air quality
  • Visualization of findings using various plotting techniques
  • Interpretation of results in the context of environmental policy and urban planning

Technologies used:

  • Python for data analysis and visualization
  • Pandas and NumPy for data manipulation
  • Matplotlib and Seaborn for creating informative visualizations
  • Jupyter Notebooks for interactive development and presentation

This project demonstrates my ability to work with real-world datasets, perform comprehensive data analysis, and derive meaningful insights from complex environmental data.

Optiver - Trading at the Close Project

Optiver - Trading at the Close: Predict US Stock Movements

Optiver - Trading at the Close: Predict US Stock Movements

A machine learning project to predict stock price movements during the closing auction, developed as part of the DS 5220 Supervised Machine Learning course.

This project focuses on predicting the closing prices of NASDAQ-listed stocks during the Closing Cross auction, a critical event that determines official closing prices for securities. Our team participated in the Kaggle competition "Optiver - Trading at the Close," developing a model to forecast the movement of the Weighted Average Price (WAP) one minute into the future.

Key aspects of the project:

  • Analysis of high-frequency trading data from NASDAQ
  • Implementation of various machine learning models, including LightGBM, XGBoost, and Neural Networks
  • Extensive feature engineering to capture market dynamics
  • Focus on computational efficiency for real-time prediction scenarios
  • Evaluation using Mean Absolute Error (MAE) metric

Technologies and techniques used:

  • Python for data processing and model development
  • LightGBM and XGBoost for gradient boosting
  • Neural Networks for deep learning approaches
  • Statistical tools like ARIMA for time series analysis
  • Advanced feature engineering techniques

Achievements:

  • Ranked in the top 20% of the Kaggle competition at the time of project completion
  • Developed a model with a total execution time of approximately 5.7 hours for the final submission
  • Gained significant insights into financial market dynamics and high-frequency trading data analysis

This project showcases my skills in financial data analysis, machine learning model development, and handling large-scale datasets in a high-frequency trading context.

Soundit Project

Soundit: Database-Driven Music Streaming Platform

Soundit: Database-Driven Music Streaming Platform

A dynamic music platform with advanced database integration and personalized user experiences.

Co-developed this project at Northeastern University; Soundit showcases the integration of complex database schemas with interactive music streaming features.

Key aspects of the project:

  • Integrated complex MySQL database schema for user interactions and music management
  • Engineered a recommendation system for personalized user experiences
  • Implemented user authentication and subscription services
  • Developed interactive features to enhance user engagement and platform functionality

Technologies used:

  • Backend: Python
  • Frontend: JavaScript
  • Database: MySQL
  • Additional: User authentication systems, Recommendation algorithms

This project demonstrates my skills in database design, full-stack development, and the implementation of advanced features in a music streaming context.

BDMA Project

Binomial Distribution, Modeling and Analysis (BDMA)

Binomial Distribution, Modeling and Analysis (BDMA)

A comprehensive Python package for working with binomial distributions.

Developed as part of a team project, BDMA offers a specialized toolkit for analyzing, simulating, and visualizing binomial experiments.

Key features:

  • Probability calculations (PMF, CDF)
  • Descriptive statistics (mean, variance, skewness, etc.)
  • Hypothesis testing for binomial proportions
  • Binomial experiment simulation
  • Visualization tools for distribution analysis
  • Random sampling from binomial distributions

Technologies used:

  • Python for core package development
  • NumPy and SciPy for numerical computations
  • Matplotlib for data visualization
  • Unittest for comprehensive testing

This project demonstrates my ability to work in a team environment, develop statistical software, and create user-friendly tools for complex mathematical concepts.

Ensemble Methods Analysis

Image source: Towards AI

Performance Analysis of Ensemble Methods

Performance Analysis of Ensemble Methods

A comprehensive comparison of ensemble learning techniques on UCI datasets.

This project, conducted at New York University, focused on analyzing the effectiveness of various ensemble methods on the UCI Poker Hand and Connect-4 datasets.

Key achievements:

  • Led analysis of UCI's Poker Hand and Connect-4 datasets, emphasizing effectiveness of ensemble methods
  • Utilized XGBoost to benchmark performance, achieving an outstanding testing accuracy of 89.11% for the Connect-4 dataset
  • Improved Poker Hand dataset predictions to 82.77% with XGBoost, resolving data imbalances and data skew
  • Compared performance of Bagging, Random Forest, AdaBoost, Gradient Boosting, and XGBoost

Technologies used:

  • Python for data processing and model implementation
  • Scikit-learn for traditional ensemble methods
  • XGBoost library for advanced gradient boosting
  • Pandas and NumPy for data manipulation
  • Matplotlib and Seaborn for visualization

This project showcases my ability to work with complex datasets, implement and compare various machine learning algorithms, and derive meaningful insights from model performance.

Contact

Let's Connect 👋