Matt Ebrahim

Matt Ebrahim, Ph.D.

Senior Data Scientist · Generative AI · Drug Discovery · Biotech

Senior Data Scientist at Formation Bio, leading high-impact drug repurposing initiatives and owning the full ML lifecycle from data curation through stakeholder communication.

I build scalable models—transformers, GNNs, diffusion, and reinforcement learning—for de novo molecule generation, ADMET prediction, and clinical portfolio prioritization. I also teach graduate-level AI courses at Northeastern University.

Ph.D. in Biomedical Engineering from Stony Brook University. 8+ first-author papers, 30+ co-authored publications, and a U.S. patent.

Formation Bio

Senior Data Scientist · Jun 2025 – Present

Lead drug repurposing initiatives across the full ML lifecycle. Architected GNN pipelines on biomedical knowledge graphs for portfolio investment decisions. Built MRI surrogate endpoint models, scalable data curation pipelines on GCP, and fine-tuned domain-specific LLMs for ontology mapping.

1910 Genetics

AI Scientist II · Nov 2024 – May 2025

Led design and deployment of generative AI systems for de novo molecular generation. Developed novel multimodal architecture for BBB permeability prediction (submitted to NeurIPS 2025). Delivered molecules that progressed through in vitro/in vivo validation with nanomolar inhibition.

1910 Genetics

AI Scientist I · 2023 – Nov 2024

Built and deployed GNNs, chemical language models, and diffusion models for small molecule generation. Integrated quantum mechanics features into graph models for ADMET prediction. Developed SMILES-based RNNs with RL for CNS-targeted drug discovery.

Northwestern University

Clinical Research Associate · 2022 – 2023

Led research at the intersection of medical imaging and deep learning. Developed CycleGAN for MRI flow synthesis from CT data. Built CNN-MLP systems for non-invasive aortic blood flow estimation from wearable SCG signals, resulting in a first-author publication and U.S. patent.

Stony Brook University

Ph.D. Research · 2017 – 2022

Developed signal- and image-processing ML models for terahertz imaging and burn injury diagnostics. Published 8+ first-author papers and co-authored 30+ peer-reviewed publications.

Northeastern Adjunct Teaching Professor, Northeastern University · Spring 2025 – Present

Designed and delivered graduate courses with hands-on experience in foundation models, diffusion models for medical image synthesis, and chemical language models for drug generation.

Languages Python, C++, MATLAB, Bash
AI/ML Frameworks PyTorch, TensorFlow, Scikit-learn, Optuna, Hugging Face
Generative Modeling Diffusion (DDPM), Transformers (GPT, T5), VAE, GAN, RNN/LSTM
Molecular Modeling RDKit, DeepChem, AutoDock Vina, ESMFold, AlphaFold
Graph ML Graphormer, GCN, GAT, MPNN, DGL, PyG
Foundation Models BioMistral, BioMegatron, SapBERT, MedGemma, BioMedCLIP
Cloud & MLOps Azure ML, AWS (Bedrock, SageMaker), GCP, Snowflake, Docker, Git
Domains Drug discovery, ADMET, protein-ligand design, medical imaging, knowledge graphs

Ph.D., Biomedical Engineering

Stony Brook University · 2022

B.Sc., Electrical Engineering

Amirkabir University of Technology · 2016
Cardiovascular Diagnostics Using Deep Learning Patent

Personalized Chest Acceleration Using Deep Learning

U.S. Patent · Issued 2025

ML system for noninvasive cardiovascular flow mapping using wearable SCG sensors.

View patent
ADMET Prediction

Multimodal Graph-Attention Networks with QM-Guided Cross-Attention for ADMET Prediction

Submitted to NeurIPS 2025 · First Author

Fusion of GNNs and quantum mechanics for blood-brain barrier prediction.

Aortic Diagnosis

Deep Learning for Aortic Flow Estimation from SCG

Annals of Biomedical Engineering, 2023 · First Author

Deep learning pipeline to estimate flow dynamics from wearable sensors.

Read paper
Burn Injury Triage

Deep Learning for Triage of In Vivo Burn Injuries

Biomedical Optics Express, 2022 · First Author

Automated burn injury triage using deep learning on optical imaging data.

Read paper
THz Burn Imaging

THz Spectroscopic Imaging and LSTM for Non-invasive Dermal Burn Depth Measurements

Scientific Reports, 2022 · First Author

Automated burn classification using THz time-domain imaging.

Read paper

+ 30 additional co-authored publications in medical imaging, spectroscopy, and biomedical signal processing.

Open to collaborations in AI-driven drug discovery, biomedical data modeling, and generative learning research.