Han Xiao, Ph.D. (肖涵) · Han Xiao Blog

I’m the VP of AI at Elastic. I founded Jina AI in 2020 and served as CEO until its acquisition by Elastic (NYSE: ESTC) in October 2025. Before that, I led search R&D at Tencent and worked on search and recommendations at Zalando. I created Fashion-MNIST, a widely used computer vision benchmark with 12,000+ citations, and got my Ph.D. from TU Munich in 2014 on adversarial and robust non-parametric Bayesian learning.

I’ve lived and worked across the San Francisco Bay Area, Berlin, Munich, Taipei, Beijing, and Shenzhen, and I’m currently based in Mountain View.

Go west, young man!

Go West, young man!

Publications

mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple Silicon, 2026.3 arXiv
jina-embeddings-v5-text: Task-Targeted Embedding Distillation, 2026.2 SIGIR 2026
Embedding Inversion via Conditional Masked Diffusion Language Models, 2026.2 arXiv Demo
Embedding Compression via Spherical Coordinates, 2026.1 ICLR 2026 GRaM Workshop Poster
Vision Encoders in Vision-Language Models: A Survey, 2025.12 PDF
Jina-VLM: Small Multilingual Vision Language Model, 2025.12 ICLR 2026 DataFM Workshop
jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking, 2025.10 AAAI 2026 Frontier IR Workshop
Efficient Code Embeddings from Code Generation Models, 2025.8 NeurIPS 2025 DL4C Workshop
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval, 2025.6 EMNLP 2025 MRL Workshop
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON, 2025.3 ICLR 2025 SCI-FM Workshop
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark, 2024.12 ACL 2025
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images, 2024.12 ICLR 2025 SCI-FM Workshop
jina-embeddings-v3: Multilingual Embeddings With Task LoRA, 2024.9 ECIR 2025 Industry Track
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models, 2024.9 SIGIR 2025 RobustIR Workshop
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever, 2024.8 EMNLP 2024 MRL Workshop
Jina CLIP: Your CLIP Model Is Also Your Text Retriever, 2024.5 ICML 2024 MFM-EAI Workshop
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings, 2024.2 arXiv
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, 2023.10 arXiv
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models, 2023.7 EMNLP 2023 NLP-OSS Workshop
Dual ask-answer network for machine reading comprehension, 2018.9 arXiv
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, 2017 arXiv
Support vector machines under adversarial label contamination Neurocomputing
Efficient Online Sequence Prediction with Side Information, 2013 IEEE ICDM 2013
Lazy Gaussian Process Committee for Real-Time Online Regression, 2013 AAAI 2013
Learning from Multiple Observers with Unknown Expertise, 2013 PAKDD 2013
Adversarial Label Flips Attack on Support Vector Machines, 2012 ECAI 2012
Evasion Attack on Multi-Class Linear Classifier, 2012 PAKDD 2012
Supervised Topic Transition Model for Detecting Malicious System Call Sequences, 2011 SIGKDD Workshop 2011
Toward Artificial Synesthesia: Linking Pictures and Sounds via Words, 2010 NIPS Workshop 2010
Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation, 2010 ACML 2010

Media Coverages

Jina AI创业复盘：AI团队的Scaling Law是什么, 2025.12 Jina AI 中文官方公众号
The Tech Industry Is Huge—and Europe’s Share of It Is Very Small, 2025.5 The Wall Street Journal
Interview on DeepSeek-R1 and Chinese OSS AI, 2025.2 Deutsche Welle
Making it in Germany: AI expert Han Xiao, 2024.10 Deutsche Welle
The Hottest Startups in Berlin in 2024, 2024.10 WIRED
Multimodal: AI’s new frontier, 2024.5 MIT Technology Review
Documentary: Smart New World - The AI Race, 2024.2 arte
Innovation Dialogue, 2023.11 German Embassy in China
Series A fundraising, 2021.11 TechCrunch
AI DACH 30, 2021.4 Forbes
Der Hype um KI-Start-ups ist vorbei – jetzt kommt es auf Qualität an, 2021.3 Handelsblatt
Spotlight, 2019.5 Nature
Interview, 2019.4 The New Stack