Senior Data Scientist at Formation Bio, leading high-impact drug repurposing initiatives and owning the full ML lifecycle from data curation through stakeholder communication.
I build scalable models—transformers, GNNs, diffusion, and reinforcement learning—for de novo molecule generation, ADMET prediction, and clinical portfolio prioritization. I also teach graduate-level AI courses at Northeastern University.
Ph.D. in Biomedical Engineering from Stony Brook University. 8+ first-author papers, 30+ co-authored publications, and a U.S. patent.
Lead drug repurposing initiatives across the full ML lifecycle. Architected GNN pipelines on biomedical knowledge graphs for portfolio investment decisions. Built MRI surrogate endpoint models, scalable data curation pipelines on GCP, and fine-tuned domain-specific LLMs for ontology mapping.
Led design and deployment of generative AI systems for de novo molecular generation. Developed novel multimodal architecture for BBB permeability prediction (submitted to NeurIPS 2025). Delivered molecules that progressed through in vitro/in vivo validation with nanomolar inhibition.
Built and deployed GNNs, chemical language models, and diffusion models for small molecule generation. Integrated quantum mechanics features into graph models for ADMET prediction. Developed SMILES-based RNNs with RL for CNS-targeted drug discovery.
Led research at the intersection of medical imaging and deep learning. Developed CycleGAN for MRI flow synthesis from CT data. Built CNN-MLP systems for non-invasive aortic blood flow estimation from wearable SCG signals, resulting in a first-author publication and U.S. patent.
Developed signal- and image-processing ML models for terahertz imaging and burn injury diagnostics. Published 8+ first-author papers and co-authored 30+ peer-reviewed publications.
Adjunct Teaching Professor, Northeastern University · Spring 2025 – Present
Designed and delivered graduate courses with hands-on experience in foundation models, diffusion models for medical image synthesis, and chemical language models for drug generation.
Patent
ML system for noninvasive cardiovascular flow mapping using wearable SCG sensors.
View patent
Fusion of GNNs and quantum mechanics for blood-brain barrier prediction.
Deep learning pipeline to estimate flow dynamics from wearable sensors.
Read paper
Automated burn injury triage using deep learning on optical imaging data.
Read paper
Automated burn classification using THz time-domain imaging.
Read paper+ 30 additional co-authored publications in medical imaging, spectroscopy, and biomedical signal processing.
Open to collaborations in AI-driven drug discovery, biomedical data modeling, and generative learning research.
m.ebrahimkhani1993@gmail.com