Open to Remote Roles

Taimoor Khan

I build AI systems
that go to production

RAG pipelines, LLM infrastructure, and MLOps platforms engineered for real-world scale. From clinical AI with sub-200ms retrieval to enterprise ML lifecycle automation.

3+Years in Production AI
10+AI Systems Shipped
Sub-200msRAG Query Latency
Lead AuthorResearch Publication
Scroll
About
Taimoor Khan

Taimoor Khan

AI Engineer · Abbottabad, Pakistan

AI systems engineer,
not just an ML practitioner

My work sits at the intersection of AI research and production engineering. I design systems where the hard problems are retrieval quality, latency under load, hallucination control, and infrastructure that can actually be maintained at scale. Not proof-of-concepts. Shipped systems.

My flagship project is CliniSynapse AI, a clinical decision support system built over 167K real patient case records. It combines FAISS semantic search, domain-aware clinical scoring, and cross-encoder reranking into a sub-200ms pipeline where every LLM response is grounded in retrieved evidence with hallucinations explicitly blocked at the output layer.

Across 3 years of production AI work, I have shipped predictive analytics platforms for hospital networks, solar-energy forecasting pipelines with automated MLflow retraining, and edge-deployed computer vision systems. I also hold a research publication as lead author in visual object tracking, achieving a 12.4% improvement on the VOT2022 benchmark.

I am looking for a remote AI engineering role where the work is genuinely difficult and the systems actually matter.

Background

  • Abbottabad, Pakistan · Open to Remote

    Location

  • BS Artificial Intelligence

    PAF-IAST · Haripur, Pakistan

  • Research Publication · Lead Author

    Visual Object Tracking · VOT2022 Benchmark

Core Focus Areas

RAG Pipeline Design
LLM Orchestration
MLOps Infrastructure
Vector Search (FAISS)
Hallucination Mitigation
Model Fine-tuning
FastAPI Systems
Edge Deployment
Projects

What I've built

Production-grade AI systems, not tutorials or demos. Each one is engineered to handle real-world constraints at scale.

Flagship

CliniSynapse AI

Clinical Decision Support System

Private

A production RAG system built over 167K clinical case records that delivers sub-200ms diagnostic query responses. The retrieval pipeline combines FAISS semantic search, domain-aware clinical feature scoring, and cross-encoder reranking. Every LLM output is grounded in retrieved evidence with hallucinations explicitly blocked at the generation layer.

  • Sub-200ms end-to-end query latency
  • 97% retrieval recall via FAISS IVFFlat
  • 5-layer multi-level caching system
  • Hallucination-blocked LLM outputs
  • Live PubMed evidence retrieval
RAGBioBERTFAISSFastAPIGemini / GPT-4PythonPubMed API
Open Source

Enterprise-RAG-Framework

Modular RAG Architecture

A production-ready RAG system built for enterprise scale, with pluggable retrieval backends, configurable reranking stages, comprehensive evaluation metrics, and multi-tenant context isolation. Designed so teams can swap components without touching the rest of the pipeline.

  • Pluggable retrieval backends
  • Configurable reranking pipeline
  • Multi-tenant context isolation
  • Advanced hallucination metrics
PythonFAISSTransformersOpenAIFastAPIDocker
Infrastructure

MLOps-Forge

ML Lifecycle Automation

A complete ML infrastructure framework that covers experiment tracking, model versioning, automated evaluation gates, and deployment workflows on AWS. Built to solve the real problem of getting models from experiment to production reliably with full observability.

  • MLflow experiment tracking
  • Automated evaluation gates
  • Kubernetes-ready deployment
  • CI/CD with GitHub Actions
PythonMLflowDockerKubernetesAWSPyTorchGitHub Actions
Open Source

AI Fairness & Explainability Toolkit

Model Interpretability Framework

An open-source framework for auditing production ML models for bias and explainability. Supports SHAP value analysis, attention map visualizations, and a comprehensive fairness metric suite across multiple ML frameworks and sensitive deployment domains.

  • SHAP value explanations
  • Attention map visualizations
  • Multi-framework support
  • Fairness metric suite
PythonSHAPScikit-learnPyTorchTensorFlow
Skills

Technical arsenal

Tools and frameworks I use to design, build, and ship AI systems to production.

LLMs & RAG

RAG System Design95%
LLM Integration & Prompting92%
FAISS / Vector Search90%
BioBERT / Sentence Transformers88%
Cross-Encoder Reranking85%
LangChain80%

AI / ML

PyTorch / TensorFlow92%
Transformers (HuggingFace)90%
LSTM / GRU / BERT88%
Computer Vision (YOLO, ViT)82%
Reinforcement Learning75%
GANs / Federated Learning72%

MLOps

Docker / Kubernetes88%
MLflow90%
AWS (EC2, S3, Lambda)80%
CI/CD Pipelines85%
Model Quantization & Pruning82%
GitHub Actions85%

Backend & Systems

Python (Advanced)97%
FastAPI / Uvicorn90%
REST API Design88%
MongoDB / SQLite80%
TypeScript / JavaScript75%
Linux Systems82%

Primary Languages

Python· Advanced
TypeScript· Proficient
JavaScript· Proficient
C++· Intermediate
SQL· Intermediate
Background

Experience & Research

Work Experience

AI Engineer

Ninth Dev Solutions

Full-time

Nov 2022 – Dec 2024

  • Designed and shipped an LSTM-based predictive analytics platform over EHR data from 15 hospitals, forecasting patient health trajectories and adverse events at scale.
  • Built a stacked ensemble model (XGBoost + LSTM) for multivariate solar-energy forecasting with automated MLflow retraining pipelines deployed to production.
  • Developed GRU-based sequence models over 5TB of retail transaction data to extract purchase-propensity and churn signals feeding downstream business systems.
  • Applied K-Means and PCA on hospital operational data, surfacing efficiency bottlenecks that reduced operational costs by 12% and improved resource allocation by 18%.
  • Led end-to-end delivery of a YOLOv4 edge deployment for real-time mask detection and BERT-based NLP tools, owning model quantization, pruning, and CI/CD integration.
Research Publication
Lead Author · 2026

Distractor-Aware Memory (DAM) for Robust Visual Object Tracking

PAF-IAST, Haripur, Pakistan

Designed a dual-branch memory architecture on SAM 2 with sigmoid-gated cross-attention fusion, achieving a 12.4% relative EAO improvement on the VOT2022 benchmark at approximately 45ms per frame with only 3% additional computational overhead.

Computer VisionObject TrackingSAM 2Attention MechanismsVOT2022

BS Artificial Intelligence

Pak-Austria Fachhochschule Institute of Applied Sciences and Technology

Haripur, Pakistan

Contact

Let's build something
worth shipping

I'm open to remote AI engineering roles, consulting engagements, and interesting AI systems problems. If you're building something ambitious, let's talk.

Email

taimoorkhaniajaznabi2@gmail.com