Nahid Zeinali, PhD  ·  Personal Site
Senior AI Research Scientist

Multimodal AI for biomedical discovery.

I build end-to-end machine learning pipelines that integrate text and structured biomedical data — from transformer architectures and retrieval-augmented generation systems to production-ready foundation-model evaluation.

Currently: AI Research Scientist at the California Medical Innovation Institute
Download CV
Nahid Zeinali, AI Research Scientist
Nahid Zeinali, PhD San Diego, California

About

I am an AI Research Scientist with deep expertise in multimodal machine learning, large language models, and biomedical AI. My work integrates text and structured biomedical data using transformer architectures and embedding-based representation learning, with a track record of building scalable, production-ready systems.

Across academia, industry, and national laboratories, I have designed end-to-end AI pipelines, retrieval-augmented generation systems, and domain-specific language models for healthcare and scientific discovery. I currently lead AI/ML and Generative AI development at the California Medical Innovation Institute in San Diego.

I hold a Ph.D. in Informatics from the University of Iowa, with 15+ peer-reviewed publications at the intersection of NLP, EHR analytics, and clinical machine learning.

Impact at a glance

9+
Years of experience building production AI systems and software engineering solutions
15+
Peer-reviewed publications
5M+
Clinical notes analyzed with BERT-family models
10+
Production AI systems built and deployed

Professional experience

Roles spanning industry, national laboratories, and academic research — with a consistent focus on healthcare AI and biomedical informatics.

Period Position
Feb 2025 —Present
AI Research Scientist
California Medical Innovation Institute · San Diego, CA
  • Lead end-to-end AI/ML and Generative AI systems using Python, PyTorch, deep learning, and NLP for healthcare and scientific discovery.
  • Led development of DMP Chef — a GenAI system using GPT-4, Gemini, and Llama 3.3 to generate NIH-compliant Data Management Plans with RAG and vector databases.
  • Led and evaluated GPT-4.1 and Llama 3.3 for NIH-compliant DMP generation using automated metrics, statistical analysis, and human-centered evaluation, with GPT achieving the highest overall user satisfaction and stronger SBERT and ROUGE performance.
  • Led and designed reproducible prompt-engineering and structured generation workflows for automated NIH Data Management Plan creation using large language models.
  • Led development of DMP Bridge — an open-source Python-based GenAI pipeline that converts funder-specific DMP PDFs into interoperable DMPTool JSON and RDA Common Standard maDMP JSON formats.
  • Designed a hybrid multimodal document AI architecture using pdfplumber, Qwen2-VL, Llama, prompt engineering, and schema-constrained generation to extract text, detect section headers, classify narrative content, and generate structured metadata from complex scientific PDFs.
  • Collaborated on FAIR-aligned data standardization pipelines for ophthalmology imaging datasets-Eye ACT- improving machine learning readiness, interoperability, and future analytics/model development.
  • Collaborated on schema evaluation and JSON-based FAIR workflows to transform scientific posters-Poster Science-into machine-readable and interoperable research objects for AI-driven discovery and reuse
  • Prepare technical reports, manuscripts, and grant applications with investigators
Aug 2021 —Jan 2025
Graduate Research & Teaching Assistant
University of Iowa, Department of Computer Science · Iowa City, IA
  • Designed transformer-based NLP systems — Symptom-BERT, Care-BERT, Spiritual-BERT — for classification, NER, and extraction from 5M+ clinical notes.
  • Built end-to-end NLP/LLM pipelines covering preprocessing, feature engineering, training, validation, and statistical evaluation.
  • Symptom-GPT: Led and developed transformer-based clinical NLP and LLM models for named entity recognition and cancer symptom detection from large-scale oncology 1M EHR clinical notes, achieving F1 scores up to 0.989 for nausea/vomiting and 0.912 for anxiety detection.
  • Symptom-BERT: Led and developed transformer-based clinical NLP and LLM models for further pretrain and fine-tune transformer models on oncology 1M EHR data to detect 13 cancer symptom groups, achieving micro-F1 = 0.933 in internal validation and micro-F1 = 0.831 in external validation.
  • Care-BERT: Led and built transformer-based clinical NLP models to identify care priorities and life-sustaining treatment preferences from EHR narratives, achieving F1 = 0.941 and AUC = 0.978 in internal validation.
  • Spiritual-BERT: Led and developed a Bio Clinical BERT-based NLP system to detect under-documented spiritual care information from 3.6M+ EHR notes, achieving F1 = 0.938 in internal validation and F1 = 0.832 in external validation.
  • Implemented GPT-based workflows for synthetic clinical note generation, supporting data augmentation and external validation.
  • OASIS – Oncology Associated Symptoms & Individualized Strategies: Collaborated with clinicians, researchers, and technical collaborators on AI- and NLP-driven oncology symptom monitoring research using EHR data to support personalized cancer care and clinical decision support.
  • Patient-Reported Outcomes & EHR Symptom Concordance: Collaborated and analyzed concordance between patient-reported and provider-documented cancer symptoms, identifying significant documentation gaps across multiple symptom groups.
  • Embeddings-Augmented NLP for Symptom Detection: Collaborated and developed embeddings-enhanced NLP pipelines for extracting cancer symptom information from 900K+ clinical notes, achieving F1 scores up to 0.937 for symptom classification tasks.
  • Cancer Symptom Prediction Using Machine Learning: Collaborated and built predictive ML models using structured and unstructured EHR data to forecast cancer symptom development, with Random Forest achieving macro-AUC = 0.755 and pain prediction AUC = 0.954.
  • Systematic Review of ML for Cancer Symptom Prediction: Led and conducted a PRISMA-guided systematic review of 42 studies evaluating machine learning approaches for cancer symptom prediction and personalized oncology care.
  • Web Analysis: Developed an end-to-end NLP and machine learning pipeline for large-scale web scraping, sentiment classification, clustering, and topic modeling across 3,000+ online articles.
  • Demand Forecasting for E-Commerce Sales: Built Python-based forecasting and analytics models to predict product demand and support inventory optimization for e-commerce sales data.
  • Taught Python and machine learning courses, guiding students to develop strong skills in programming and data science.
  • Published over 10 papers across machine learning and data science topics.
Jun 2024 —Aug 2024
NLP Data Scientist
National Cancer Institute & Frederick National Laboratory, NCATS / NIH · Maryland
  • RARe-SOURCE™: Collaborated and developed AI-powered biomedical literature intelligence workflows using LLMs, semantic search, and NLP to support rare disease knowledge discovery and scientific insight extraction.
  • Worked closely with biomedical researchers to convert research questions into usable AI workflows and decision-support tools.
Feb 2019 —Jul 2021
Software Engineer
Khorshid Hospital · Isfahan, Iran
  • Built and enhanced EMR, PIS, and LIS systems with clinical stakeholders, improving workflow efficiency by 35%.
  • Developed a heart-failure symptom-tracking mobile app used by 3,000+ patients in rural Isfahan.
Dec 2016 —Jan 2019
Software Engineer
Parisian Institute · Tehran, Iran
  • Designed an EHR management dashboard that reduced report turnaround time by 68%.
  • Trained 1,500+ clinicians on EHR adoption and workflow best practices.

Selected projects

Recent and selected projects, ordered from newest to earlier work, spanning generative AI, clinical NLP, EHR analytics, biomedical informatics, and data science.

Multimodal Document AI · Metadata Extraction

DMP Bridge

Open-source Python-based GenAI pipeline that converts funder-specific Data Management Plan PDFs into interoperable DMPTool JSON and RDA Common Standard maDMP JSON. Uses pdfplumber, Qwen2-VL, Llama, prompt engineering, and schema-constrained generation for metadata extraction and FAIR-aligned interoperability.

PythonQwen2-VL pdfplumberLLMs RDA maDMP
Generative AI · RAG

DMP Chef

Free and open-source AI-driven platform that helps researchers draft funder-compliant Data Management Plans. Uses GPT-4, Gemini, and Llama 3.3 with RAG workflows to generate structured, FAIR-aligned, and machine-actionable DMPs.

GPT-4Gemini Llama 3.3RAG Vector DB
Automatic Evaluation · Human Evaluation

NIH DMP LLM Evaluation

Evaluated Llama 3.3 and GPT-4.1 for drafting NIH-compliant Data Management Plans using automated reference-based metrics and human expert evaluation, assessing compliance, clarity, completeness, and usefulness.

GPT-4.1Llama 3.3 ROUGESBERT Human Evaluation
LLM Generation · Prompt Engineering

NIH DMP Generation

Built a reproducible workflow for automatically generating NIH-compliant DMP drafts using Llama 3.3 and GPT-4.1. The repository includes prompt engineering pipelines, generated outputs, and evaluation artifacts.

LLMsPrompt Engineering PythonFAIR
Clinical NER . Classification

Symptom-GPT / Symptom-BERT

Published clinical NLP research using BERT- and GPT-based NER models to detect anxiety and nausea/vomiting symptoms from oncology EHR notes. Symptom-BERT achieved F1 scores of 0.989 for nausea/vomiting and 0.912 for anxiety.

Bio-ClinicalBERTBio-GPT NERPyTorch
Clinical NLP . Classification

Care-BERT for Heart Failure

Transformer-based NLP model for detecting care priorities in EHR notes of older adults with heart failure, including comfort measures only and life-sustaining treatments. Achieved internal F1 = 0.941 and external F1 = 0.876.

BERTEHR Care PrioritiesAUC
Palliative Care Informatics

Care-BERT for Advance Directives

Used Care-BERT to extract life-sustaining treatment preferences from EHR narratives and study disparities in advance directive completion among 14,303 older adults with chronic conditions.

Care-BERTLogistic Regression EHR AnalyticsPalliative Care
Under-Documented Care

Spiritual-BERT

Clinical NLP model for detecting rarely documented spiritual care information in EHRs. Applied to nearly 3.6M notes from 14,729 older adults, achieving F1 = 0.938 internally and F1 = 0.832 externally.

Bio-ClinicalBERTSynthetic Notes EHRClinical NLP
Literature Intelligence · NIH

RARe-SOURCE™

AI-powered literature analysis engine built at the National Cancer Institute to surface hidden insights in rare-disease research. Combines large language models with encoder architectures for biological domain-specific Q&A.

LLMsSemantic Search Rare DiseaseBiomedical
Oncology NLP

Symptom-BERT

Published model for detecting 13 cancer symptom groups from clinical notes. Pretrained Bio-ClinicalBERT on 1M unlabeled clinical documents, fine-tuned on annotated notes, and validated using GPT-4-generated synthetic notes.

BERTPyTorch Hugging FaceClinical NLP
EHR . NimbleMiner . Statistical modeling

Patient-Reported Outcomes vs EHR Documentation

Analyzed concordance between patient-reported symptom occurrence and provider-documented symptoms in EHRs for patients receiving cancer treatment with multimorbidity, using NimbleMiner and statistical modeling.

PROsNimbleMiner Logistic RegressionEHR
Embeddings NLP . Deep Learning

Embeddings-Augmented Cancer Symptom NLP

Developed an embeddings-augmented NLP system to detect 14 cancer symptom groups and distinguish observed symptoms from negated symptoms and medication-related side effects across 902,508 clinical notes.

EmbeddingsNLP Symptom DetectionEHR
Collaborative Research · Oncology Symptom Monitoring

OASIS

Collaborative research project focused on Oncology Associated Symptoms & Individualized Strategies. Contributed to AI/NLP-supported workflows for cancer symptom monitoring, EHR-based symptom analysis,and patient-centered oncology care research in collaboration with interdisciplinary nursing informatics teams.

Clinical NLPEHR Data OncologySymptom Monitoring
Predictive Modeling . Machine Learning

ML for Cancer Symptom Prediction

Used structured and unstructured EHR data from 8,156 adults with cancer to predict 12 common symptoms. Random Forest achieved the strongest overall performance with macro AUC = 0.755 and F1 = 0.729.

Random ForestXGBoost SHAPEHR
Systematic Review . Python

ML for Cancer Symptom Prediction Review

PRISMA-guided systematic review of 42 studies using machine learning to predict cancer symptoms and identify predictors. Synthesized algorithms, cancer sites, symptoms, sample sizes, and research gaps.

PRISMAMachine Learning Cancer SymptomsReview
Web Mining

Web Analysis

Built a complete NLP pipeline for scraping 3,000+ eHow home and living articles, manually labeling sentiment data, training eight classifiers, and applying K-Means clustering and LDA topic modeling.

Web ScrapingSentiment Analysis K-MeansLDA
Forecasting . Machine Learning

Python Demand Forecasting

Applied Python-based data analysis and predictive modeling to forecast three-month product demand for an online electronics retailer using 100 weeks of weekly sales data across 44 SKUs.

PythonTime Series ForecastingE-Commerce
Big Data Healthcare

Big Data Analytics in Healthcare

Published survey categorizing Big Data analytics applications in healthcare using the WHO “6 building blocks of health systems” framework, reviewing 130 articles and books over a 10-year period.

Big DataHealthcare WHO FrameworkSurvey
Health Interoperability

Health Information Systems Interoperability

Proposed a conceptual model using a Health Service Bus and service-oriented architecture to improve interoperability across hospital information systems, EHRs, CDSS, telemedicine systems, and other healthcare platforms.

HSBSOA HL7OpenEHR

Publications

15+ peer-reviewed papers in JAMIA, Applied Clinical Informatics, JMIR Cancer, JCO Clinical Cancer Informatics, and other leading informatics journals.

Under Review

# Title & Authors Status
Evaluating the Performance of LLMs in Creating NIH Data Management Plans
Zeinali N., Patel B., et al.
Review
Artificial Intelligence Reveals Language Disparities in Person-Centered Spiritual Care: Access and Timing Among Older Adults
AlBashayreh A., Zeinali N., et al.
Review
Advance Directives and Dementia: How Illness Trajectories Influence Goals-of-Care
AlBashayreh A., Zeinali N., et al.
Review

2025

# Title & Authors Venue
01
Using Large Language Models to Detect Anxiety and Nausea/Vomiting Documentation in Clinical Notes of Patients with Cancer
Zeinali N., White S., et al.
CIN Journal
02
Goals-of-Care in Older Adults with Heart Failure, Cancer, and Dementia: Classifying Comfort and Life-Sustaining Preferences Using Priorities-BERT
AlBashayreh A., Zeinali N., et al.
Innovation in Aging
03
An Informatics Approach to Characterizing Spiritual Care Documentation in Electronic Health Records of Older Adults
AlBashayreh A., Zeinali N., Gilbertson-White S.
ACI Journal

2024

# Title & Authors Venue
04
Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes ★ AMIA
Zeinali N., White S., et al.
J. Pain & Symptom Mgmt.
05
Machine Learning Approaches to Predict Symptoms in People with Cancer: A Systematic Review
Zeinali N., Gilbertson-White S., et al.
JMIR Cancer
06
Natural Language Processing Accurately Differentiates Cancer Symptom Information in EHR Narratives
AlBashayreh A., Bandyopadhyay A., Zeinali N., et al.
JCO CCI
07
Predictors of Concordance Between Patient-Reported and Provider-Documented Symptoms in the Context of Cancer and Multimorbidity
Gilbertson-White S., AlBashayreh A., Bandyopadhyay A., Zeinali N., et al.
ACI Journal
08
Using Real-World EHR Data to Predict the Development of 12 Cancer-Related Symptoms in Multimorbidity
Bandyopadhyay A., AlBashayreh A., Zeinali N., et al.
Open JAMIA
09
Innovating the Detection of Care Priorities in Heart Failure Using Large Language Models
AlBashayreh A., Zeinali N., Gilbertson-White S.
Innovation in Aging

Earlier Work

# Title & Authors Year
10
Application of Big Data Analysis in Healthcare Based on the Six Building Blocks of Health Systems' Framework: A Survey
Nazari E., Zeinali N., et al. — Dokkyo Journal of Medical Sciences
2021
11
Provide Interoperability Model to Interact in Hospital Information Systems
Zeinali N., Asosheh A., et al. — J. Health and Biomedical Informatics
2017
12
The Conceptual Model to Solve the Problem of Interoperability in Health Information Systems
Zeinali N., Asosheh A., et al. — IEEE IST
2016
13
The Common Applications of Social Networks in Healthcare
Delaram Z., Zeinali N., et al. — Health Information Management
2016

Presentations & posters

Title Venue Year
Evaluating the Performance of Large Language Models in Drafting Data Management Plans
Zeinali N. (Presenter), Patel B., et al.
BOSC Conference 2026
Toward Envision Portal: A FAIR and AI-Ready Framework for Ophthalmic Imaging
Patel B., Zeinali N., et al.
ARVO Annual Meeting 2026
Evaluating the Effectiveness of an Open-Source LLM in Drafting NIH Data Management Plans
Zeinali N. (Presenter), Patel B., et al.
SciDataCon 2025
Leveraging LLMs for Named Entity Recognition of Anxiety and Nausea/Vomiting in Patients with Cancer
Zeinali N. (Presenter), Gilbertson-White S., et al.
AMIA Informatics Summit 2025
Advanced Detection of Nausea/Vomiting and Anxiety in Patients with Cancer
Zeinali N. (Presenter), Gilbertson-White S., et al.
AMIA Annual Symposium 2024
Comparison of BERT Implementations for Enhanced Cancer Symptom Extraction (Oral)
Zeinali N. (Presenter), AlBashayreh A., Gilbertson-White S., et al.
IEEE AIMHC 2024
Leveraging Spiritual-BERT for Characterizing Spiritual Care Documentation in EHRs of Older Adults with Heart Failure
AlBashayreh A., Zeinali N., et al.
AMIA Annual Symposium 2024
Innovating the Detection of Care Priorities in Heart Failure Using Large Language Models
AlBashayreh A., Zeinali N., et al.
GSA Annual Meeting (Poster) 2024
Disparities in Advance Directive Completion and Life-Sustaining Treatment Preferences in Older Adults
AlBashayreh A., Zeinali N., et al.
Annual Assembly Hospice & Palliative Care (Poster) 2024

Technical skills

Languages, frameworks, and tools I use across the AI/ML stack — from research prototyping to production deployment.

LLMs & Generative AI

  • GPT, Llama, BERT, RoBERTa
  • Retrieval-Augmented Generation (RAG)
  • Prompt engineering, fine-tuning
  • OpenAI API, agentic AI
  • Transformer encoders / decoders

Deep Learning & ML

  • PyTorch, TensorFlow, Keras
  • Scikit-learn, XGBoost
  • SHAP, model explainability
  • Predictive modeling, NER
  • Recommender systems

NLP Frameworks

  • Hugging Face Transformers
  • LangChain, LangGraph
  • LangSmith, Ollama
  • Inference & retrieval optimization

MLOps & CI/CD

  • MLflow, DVC, DagsHub
  • Apache Airflow (Astro)
  • Grafana, workflow automation
  • Docker, GitHub Actions

Cloud & Compute

  • AWS, AWS SageMaker
  • Google Cloud Platform (GCP)
  • HPC, Linux

Databases & Vector Stores

  • PostgreSQL, MS SQL Server
  • MongoDB, Cassandra
  • FAISS, ChromaDB, Pinecone

Analytics & Statistics

  • Pandas, NumPy
  • Matplotlib, Seaborn
  • SPSS, SAS, STATA
  • Power BI

Programming Languages

  • Python (primary)
  • C, C++, C#
  • JavaScript, MATLAB
  • ASP.NET, Android

Healthcare Standard

  • FAIR, HL7
  • SNOMED-CT
  • ICD-10
  • HIPAA

Education & recognition

Education
Ph.D. in Informatics
University of Iowa · Iowa City, IA
M.Sc. in Informatics
University of Iowa · Iowa City, IA
M.Sc. in Medical Informatics
Tarbiat Modares University · Tehran, Iran
B.S. in Computer Software Engineering
Najafabad Azad University · Isfahan, Iran
Honors & Awards
Excellent Award in Research
University of Iowa · Spring 2025
Ballard & Seashore Dissertation Fellowship
University of Iowa · Fall 2024
Student Impact Grant
University of Iowa · Spring 2024
Research and Travel GPSG/GSS Award
University of Iowa · 2021 — 2024
Recruitment Fellowship, IGPI
University of Iowa · 2021 — 2024

Community & service

Reviewer Volunteer
Informatics Summit 2025, AMIA · Fall 2024
P2P Mentor & Mentee
University of Iowa · Fall 2024
Student Volunteer
AMIA 2024 Annual Symposium · Fall 2024
Student Volunteer, ISO
University of Iowa · 2021 — 2023

Get in touch

I'm open to research collaborations on clinical AI, large language models, and biomedical informatics. The best way to reach me is by email.

Location San Diego, California

Send a message