Multimodal Document AI · Metadata Extraction
DMP Bridge
Open-source Python-based GenAI pipeline that converts funder-specific Data Management Plan PDFs into interoperable DMPTool JSON and RDA Common Standard maDMP JSON. Uses pdfplumber, Qwen2-VL, Llama, prompt engineering, and schema-constrained generation for metadata extraction and FAIR-aligned interoperability.
PythonQwen2-VL
pdfplumberLLMs
RDA maDMP
Generative AI · RAG
DMP Chef
Free and open-source AI-driven platform that helps researchers draft funder-compliant Data Management Plans. Uses GPT-4, Gemini, and Llama 3.3 with RAG workflows to generate structured, FAIR-aligned, and machine-actionable DMPs.
GPT-4Gemini
Llama 3.3RAG
Vector DB
Automatic Evaluation · Human Evaluation
NIH DMP LLM Evaluation
Evaluated Llama 3.3 and GPT-4.1 for drafting NIH-compliant Data Management Plans using automated reference-based metrics and human expert evaluation, assessing compliance, clarity, completeness, and usefulness.
GPT-4.1Llama 3.3
ROUGESBERT
Human Evaluation
LLM Generation · Prompt Engineering
NIH DMP Generation
Built a reproducible workflow for automatically generating NIH-compliant DMP drafts using Llama 3.3 and GPT-4.1. The repository includes prompt engineering pipelines, generated outputs, and evaluation artifacts.
LLMsPrompt Engineering
PythonFAIR
Clinical NER . Classification
Symptom-GPT / Symptom-BERT
Published clinical NLP research using BERT- and GPT-based NER models to detect anxiety and nausea/vomiting symptoms from oncology EHR notes. Symptom-BERT achieved F1 scores of 0.989 for nausea/vomiting and 0.912 for anxiety.
Bio-ClinicalBERTBio-GPT
NERPyTorch
Clinical NLP . Classification
Care-BERT for Heart Failure
Transformer-based NLP model for detecting care priorities in EHR notes of older adults with heart failure, including comfort measures only and life-sustaining treatments. Achieved internal F1 = 0.941 and external F1 = 0.876.
BERTEHR
Care PrioritiesAUC
Palliative Care Informatics
Care-BERT for Advance Directives
Used Care-BERT to extract life-sustaining treatment preferences from EHR narratives and study disparities in advance directive completion among 14,303 older adults with chronic conditions.
Care-BERTLogistic Regression
EHR AnalyticsPalliative Care
Under-Documented Care
Spiritual-BERT
Clinical NLP model for detecting rarely documented spiritual care information in EHRs. Applied to nearly 3.6M notes from 14,729 older adults, achieving F1 = 0.938 internally and F1 = 0.832 externally.
Bio-ClinicalBERTSynthetic Notes
EHRClinical NLP
Literature Intelligence · NIH
RARe-SOURCE™
AI-powered literature analysis engine built at the National Cancer Institute to surface hidden insights in rare-disease research. Combines large language models with encoder architectures for biological domain-specific Q&A.
LLMsSemantic Search
Rare DiseaseBiomedical
Oncology NLP
Symptom-BERT
Published model for detecting 13 cancer symptom groups from clinical notes. Pretrained Bio-ClinicalBERT on 1M unlabeled clinical documents, fine-tuned on annotated notes, and validated using GPT-4-generated synthetic notes.
BERTPyTorch
Hugging FaceClinical NLP
EHR . NimbleMiner . Statistical modeling
Patient-Reported Outcomes vs EHR Documentation
Analyzed concordance between patient-reported symptom occurrence and provider-documented symptoms in EHRs for patients receiving cancer treatment with multimorbidity, using NimbleMiner and statistical modeling.
PROsNimbleMiner
Logistic RegressionEHR
Embeddings NLP . Deep Learning
Embeddings-Augmented Cancer Symptom NLP
Developed an embeddings-augmented NLP system to detect 14 cancer symptom groups and distinguish observed symptoms from negated symptoms and medication-related side effects across 902,508 clinical notes.
EmbeddingsNLP
Symptom DetectionEHR
Collaborative Research · Oncology Symptom Monitoring
OASIS
Collaborative research project focused on Oncology Associated Symptoms & Individualized Strategies.
Contributed to AI/NLP-supported workflows for cancer symptom monitoring, EHR-based symptom analysis,and patient-centered oncology care research in collaboration with interdisciplinary nursing informatics teams.
Clinical NLPEHR Data
OncologySymptom Monitoring
Predictive Modeling . Machine Learning
ML for Cancer Symptom Prediction
Used structured and unstructured EHR data from 8,156 adults with cancer to predict 12 common symptoms. Random Forest achieved the strongest overall performance with macro AUC = 0.755 and F1 = 0.729.
Random ForestXGBoost
SHAPEHR
Systematic Review . Python
ML for Cancer Symptom Prediction Review
PRISMA-guided systematic review of 42 studies using machine learning to predict cancer symptoms and identify predictors. Synthesized algorithms, cancer sites, symptoms, sample sizes, and research gaps.
PRISMAMachine Learning
Cancer SymptomsReview
Web Mining
Web Analysis
Built a complete NLP pipeline for scraping 3,000+ eHow home and living articles, manually labeling sentiment data, training eight classifiers, and applying K-Means clustering and LDA topic modeling.
Web ScrapingSentiment Analysis
K-MeansLDA
Forecasting . Machine Learning
Python Demand Forecasting
Applied Python-based data analysis and predictive modeling to forecast three-month product demand for an online electronics retailer using 100 weeks of weekly sales data across 44 SKUs.
PythonTime Series
ForecastingE-Commerce
Big Data Healthcare
Big Data Analytics in Healthcare
Published survey categorizing Big Data analytics applications in healthcare using the WHO “6 building blocks of health systems” framework, reviewing 130 articles and books over a 10-year period.
Big DataHealthcare
WHO FrameworkSurvey
Health Interoperability
Health Information Systems Interoperability
Proposed a conceptual model using a Health Service Bus and service-oriented architecture to improve interoperability across hospital information systems, EHRs, CDSS, telemedicine systems, and other healthcare platforms.
HSBSOA
HL7OpenEHR