About
I am the VP of AI at Elastic. I founded Jina AI in 2020 and served as its CEO until its acquisition by Elastic (NYSE: ESTC) on October 9, 2025. Previously, I led search initiatives at Tencent and worked on search and recommendations at Zalando. I am the creator of Fashion-MNIST, an important computer vision dataset with 11,000+ citations. I got my Ph.D. from TU Munich in 2014 on adversarial and robust non-parametric Bayesian learning.
I’ve lived and worked in many places, including the San Francisco Bay Area, Berlin, Munich, Taipei, Beijing, and Shenzhen. I’m currently based in Mountain View.

Selected Publications
- jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking, 2025.10 - arXiv
 - Efficient Code Embeddings from Code Generation Models, 2025.8 - NeurIPS 2025 DL4C Workshop
 - jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval, 2025.6 - EMNLP 2025 MRL Workshop
 - ReaderLM-v2: Small Language Model for HTML to Markdown and JSON, 2025.3 - ICLR 2025 SCI-FM Workshop
 - AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark, 2024.12 - ACL 2025
 - jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images, 2024.12 - ICLR 2025 SCI-FM Workshop
 - jina-embeddings-v3: Multilingual Embeddings With Task LoRA, 2024.9 - ECIR 2025 Industry Track
 - Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models, 2024.9 - SIGIR 2025 RobustIR Workshop
 - Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever, 2024.8 - EMNLP 2024 Multilingual Representation Learning Workshop
 - Jina CLIP: Your CLIP Model Is Also Your Text Retriever, 2024.5 - ICML 2024 MFM-EAI Workshop
 - Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings, 2024.2 - arXiv
 - Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, 2023.10 - arXiv
 - Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models, 2023.7 - EMNLP 2023 NLP-OSS Workshop
 - Dual ask-answer network for machine reading comprehension, 2018.9 - arXiv
 - Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, 2017 (> 11K Citations) - arXiv
 - Support vector machines under adversarial label contamination - Neurocomputing
 - Efficient Online Sequence Prediction with Side Information, 2013 - IEEE ICDM 2013
 - Lazy Gaussian Process Committee for Real-Time Online Regression, 2013 - AAAI 2013
 - Learning from Multiple Observers with Unknown Expertise, 2013 - PAKDD 2013
 - Adversarial Label Flips Attack on Support Vector Machines, 2012 - ECAI 2012
 - Evasion Attack on Multi-Class Linear Classifier, 2012 - PAKDD 2012
 - Supervised Topic Transition Model for Detecting Malicious System Call Sequences, 2011 - SIGKDD Workshop 2011 (Best paper award)
 - Toward Artificial Synesthesia: Linking Pictures and Sounds via Words, 2010 - NIPS Workshop 2010
 - Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation, 2010 - ACML 2010
 
Selected Media Coverages
- The Wall Street Journal, 2025.5 - The Tech Industry Is Huge—and Europe’s Share of It Is Very Small
 - Deutsche Welle, 2025.2 - Interview on DeepSeek-R1 and Chinese OSS AI
 - Deutsche Welle, 2024.10 - Making it in Germany: AI expert Han Xiao
 - WIRED, 2024.10 - The Hottest Startups in Berlin in 2024
 - MIT Technology Review, 2024.5 - Multimodal: AI’s new frontier
 - arte, 2024.2 - Documentary: Smart New World - The AI Race
 - German Embassy in China, 2023.11 - Innovation Dialogue
 - TechCrunch, 2021.11 - Series A fundraising
 - Forbes, 2021.4 - AI DACH 30
 - Handelsblatt, 2021.3 - Der Hype um KI-Start-ups ist vorbei – jetzt kommt es auf Qualität an
 - Nature, 2019.5 - Spotlight
 - The New Stack, 2019.4 - Interview