Open to Remote Roles

Taimoor Khan

I build AI systems
that go to production

RAG pipelines, LLM infrastructure, and MLOps platforms engineered for real-world scale. From clinical AI with sub-200ms retrieval to enterprise ML lifecycle automation.

View Projects GitHub LinkedIn Resume

3+Years in Production AI

10+AI Systems Shipped

Sub-200msRAG Query Latency

Lead AuthorResearch Publication

Scroll

About

Taimoor Khan

AI Engineer · Abbottabad, Pakistan

AI systems engineer,
not just an ML practitioner

My work sits at the intersection of AI research and production engineering. I design systems where the hard problems are retrieval quality, latency under load, hallucination control, and infrastructure that can actually be maintained at scale. Not proof-of-concepts. Shipped systems.

My flagship project is CliniSynapse AI, a clinical decision support system built over 167K real patient case records. It combines FAISS semantic search, domain-aware clinical scoring, and cross-encoder reranking into a sub-200ms pipeline where every LLM response is grounded in retrieved evidence with hallucinations explicitly blocked at the output layer.

Across 3 years of production AI work, I have shipped predictive analytics platforms for hospital networks, solar-energy forecasting pipelines with automated MLflow retraining, and edge-deployed computer vision systems. I also hold a research publication as lead author in visual object tracking, achieving a 12.4% improvement on the VOT2022 benchmark.

I am looking for a remote AI engineering role where the work is genuinely difficult and the systems actually matter.

Background

Abbottabad, Pakistan · Open to Remote
Location
BS Artificial Intelligence
PAF-IAST · Haripur, Pakistan
Research Publication · Lead Author
Visual Object Tracking · VOT2022 Benchmark

Core Focus Areas

RAG Pipeline Design

LLM Orchestration

MLOps Infrastructure

Vector Search (FAISS)

Hallucination Mitigation

Model Fine-tuning

FastAPI Systems

Edge Deployment

Projects

What I've built

Production-grade AI systems, not tutorials or demos. Each one is engineered to handle real-world constraints at scale.

Flagship

CliniSynapse AI

Clinical Decision Support System

Private

A production RAG system built over 167K clinical case records that delivers sub-200ms diagnostic query responses. The retrieval pipeline combines FAISS semantic search, domain-aware clinical feature scoring, and cross-encoder reranking. Every LLM output is grounded in retrieved evidence with hallucinations explicitly blocked at the generation layer.

Sub-200ms end-to-end query latency
97% retrieval recall via FAISS IVFFlat
5-layer multi-level caching system
Hallucination-blocked LLM outputs
Live PubMed evidence retrieval

RAGBioBERTFAISSFastAPIGemini / GPT-4PythonPubMed API

Open Source

Enterprise-RAG-Framework

Modular RAG Architecture

A production-ready RAG system built for enterprise scale, with pluggable retrieval backends, configurable reranking stages, comprehensive evaluation metrics, and multi-tenant context isolation. Designed so teams can swap components without touching the rest of the pipeline.

Pluggable retrieval backends
Configurable reranking pipeline
Multi-tenant context isolation
Advanced hallucination metrics

PythonFAISSTransformersOpenAIFastAPIDocker

Infrastructure

MLOps-Forge

ML Lifecycle Automation

A complete ML infrastructure framework that covers experiment tracking, model versioning, automated evaluation gates, and deployment workflows on AWS. Built to solve the real problem of getting models from experiment to production reliably with full observability.

MLflow experiment tracking
Automated evaluation gates
Kubernetes-ready deployment
CI/CD with GitHub Actions

PythonMLflowDockerKubernetesAWSPyTorchGitHub Actions

Open Source

AI Fairness & Explainability Toolkit

Model Interpretability Framework

An open-source framework for auditing production ML models for bias and explainability. Supports SHAP value analysis, attention map visualizations, and a comprehensive fairness metric suite across multiple ML frameworks and sensitive deployment domains.

SHAP value explanations
Attention map visualizations
Multi-framework support
Fairness metric suite

PythonSHAPScikit-learnPyTorchTensorFlow

View all repositories on GitHub

Skills

Technical arsenal

Tools and frameworks I use to design, build, and ship AI systems to production.

LLMs & RAG

RAG System Design95%

LLM Integration & Prompting92%

FAISS / Vector Search90%

BioBERT / Sentence Transformers88%

Cross-Encoder Reranking85%

LangChain80%

AI / ML

PyTorch / TensorFlow92%

Transformers (HuggingFace)90%

LSTM / GRU / BERT88%

Computer Vision (YOLO, ViT)82%

Reinforcement Learning75%

GANs / Federated Learning72%

MLOps

Docker / Kubernetes88%

MLflow90%

AWS (EC2, S3, Lambda)80%

CI/CD Pipelines85%

Model Quantization & Pruning82%

GitHub Actions85%

Backend & Systems

Python (Advanced)97%

FastAPI / Uvicorn90%

REST API Design88%

MongoDB / SQLite80%

TypeScript / JavaScript75%

Linux Systems82%

Primary Languages

Python· Advanced

TypeScript· Proficient

JavaScript· Proficient

C++· Intermediate

SQL· Intermediate

Background

Experience & Research

Work Experience

AI Engineer

Ninth Dev Solutions

Full-time

Nov 2022 – Dec 2024

Designed and shipped an LSTM-based predictive analytics platform over EHR data from 15 hospitals, forecasting patient health trajectories and adverse events at scale.
Built a stacked ensemble model (XGBoost + LSTM) for multivariate solar-energy forecasting with automated MLflow retraining pipelines deployed to production.
Developed GRU-based sequence models over 5TB of retail transaction data to extract purchase-propensity and churn signals feeding downstream business systems.
Applied K-Means and PCA on hospital operational data, surfacing efficiency bottlenecks that reduced operational costs by 12% and improved resource allocation by 18%.
Led end-to-end delivery of a YOLOv4 edge deployment for real-time mask detection and BERT-based NLP tools, owning model quantization, pruning, and CI/CD integration.

Research Publication

Lead Author · 2026

Distractor-Aware Memory (DAM) for Robust Visual Object Tracking

PAF-IAST, Haripur, Pakistan

Designed a dual-branch memory architecture on SAM 2 with sigmoid-gated cross-attention fusion, achieving a 12.4% relative EAO improvement on the VOT2022 benchmark at approximately 45ms per frame with only 3% additional computational overhead.

Computer VisionObject TrackingSAM 2Attention MechanismsVOT2022

BS Artificial Intelligence

Pak-Austria Fachhochschule Institute of Applied Sciences and Technology

Haripur, Pakistan

Contact

Let's build something
worth shipping

I'm open to remote AI engineering roles, consulting engagements, and interesting AI systems problems. If you're building something ambitious, let's talk.

taimoorkhaniajaznabi2@gmail.com

Send email

GitHub

Explore my open-source work

Connect professionally

I build AI systemsthat go to production

AI systems engineer,not just an ML practitioner

Background

Core Focus Areas

What I've built

CliniSynapse AI

Enterprise-RAG-Framework

MLOps-Forge

AI Fairness & Explainability Toolkit

Technical arsenal

LLMs & RAG

AI / ML

MLOps

Backend & Systems

Experience & Research

AI Engineer

Distractor-Aware Memory (DAM) for Robust Visual Object Tracking

Let's build somethingworth shipping

I build AI systems
that go to production

AI systems engineer,
not just an ML practitioner

Let's build something
worth shipping