Taimoor Khan
I build AI systems
that go to production
RAG pipelines, LLM infrastructure, and MLOps platforms engineered for real-world scale. From clinical AI with sub-200ms retrieval to enterprise ML lifecycle automation.
Taimoor Khan
AI Engineer · Abbottabad, Pakistan
AI systems engineer,
not just an ML practitioner
My work sits at the intersection of AI research and production engineering. I design systems where the hard problems are retrieval quality, latency under load, hallucination control, and infrastructure that can actually be maintained at scale. Not proof-of-concepts. Shipped systems.
My flagship project is CliniSynapse AI, a clinical decision support system built over 167K real patient case records. It combines FAISS semantic search, domain-aware clinical scoring, and cross-encoder reranking into a sub-200ms pipeline where every LLM response is grounded in retrieved evidence with hallucinations explicitly blocked at the output layer.
Across 3 years of production AI work, I have shipped predictive analytics platforms for hospital networks, solar-energy forecasting pipelines with automated MLflow retraining, and edge-deployed computer vision systems. I also hold a research publication as lead author in visual object tracking, achieving a 12.4% improvement on the VOT2022 benchmark.
I am looking for a remote AI engineering role where the work is genuinely difficult and the systems actually matter.
Background
Abbottabad, Pakistan · Open to Remote
Location
BS Artificial Intelligence
PAF-IAST · Haripur, Pakistan
Research Publication · Lead Author
Visual Object Tracking · VOT2022 Benchmark
Core Focus Areas
What I've built
Production-grade AI systems, not tutorials or demos. Each one is engineered to handle real-world constraints at scale.
CliniSynapse AI
Clinical Decision Support System
A production RAG system built over 167K clinical case records that delivers sub-200ms diagnostic query responses. The retrieval pipeline combines FAISS semantic search, domain-aware clinical feature scoring, and cross-encoder reranking. Every LLM output is grounded in retrieved evidence with hallucinations explicitly blocked at the generation layer.
- Sub-200ms end-to-end query latency
- 97% retrieval recall via FAISS IVFFlat
- 5-layer multi-level caching system
- Hallucination-blocked LLM outputs
- Live PubMed evidence retrieval
A production-ready RAG system built for enterprise scale, with pluggable retrieval backends, configurable reranking stages, comprehensive evaluation metrics, and multi-tenant context isolation. Designed so teams can swap components without touching the rest of the pipeline.
- Pluggable retrieval backends
- Configurable reranking pipeline
- Multi-tenant context isolation
- Advanced hallucination metrics
A complete ML infrastructure framework that covers experiment tracking, model versioning, automated evaluation gates, and deployment workflows on AWS. Built to solve the real problem of getting models from experiment to production reliably with full observability.
- MLflow experiment tracking
- Automated evaluation gates
- Kubernetes-ready deployment
- CI/CD with GitHub Actions
An open-source framework for auditing production ML models for bias and explainability. Supports SHAP value analysis, attention map visualizations, and a comprehensive fairness metric suite across multiple ML frameworks and sensitive deployment domains.
- SHAP value explanations
- Attention map visualizations
- Multi-framework support
- Fairness metric suite
Technical arsenal
Tools and frameworks I use to design, build, and ship AI systems to production.
LLMs & RAG
AI / ML
MLOps
Backend & Systems
Primary Languages
Experience & Research
AI Engineer
Ninth Dev Solutions
Nov 2022 – Dec 2024
- Designed and shipped an LSTM-based predictive analytics platform over EHR data from 15 hospitals, forecasting patient health trajectories and adverse events at scale.
- Built a stacked ensemble model (XGBoost + LSTM) for multivariate solar-energy forecasting with automated MLflow retraining pipelines deployed to production.
- Developed GRU-based sequence models over 5TB of retail transaction data to extract purchase-propensity and churn signals feeding downstream business systems.
- Applied K-Means and PCA on hospital operational data, surfacing efficiency bottlenecks that reduced operational costs by 12% and improved resource allocation by 18%.
- Led end-to-end delivery of a YOLOv4 edge deployment for real-time mask detection and BERT-based NLP tools, owning model quantization, pruning, and CI/CD integration.
Distractor-Aware Memory (DAM) for Robust Visual Object Tracking
PAF-IAST, Haripur, Pakistan
Designed a dual-branch memory architecture on SAM 2 with sigmoid-gated cross-attention fusion, achieving a 12.4% relative EAO improvement on the VOT2022 benchmark at approximately 45ms per frame with only 3% additional computational overhead.
BS Artificial Intelligence
Pak-Austria Fachhochschule Institute of Applied Sciences and Technology
Haripur, Pakistan
Let's build something
worth shipping
I'm open to remote AI engineering roles, consulting engagements, and interesting AI systems problems. If you're building something ambitious, let's talk.
taimoorkhaniajaznabi2@gmail.com