I founded Jina AI in 2020 and have been leading it as CEO ever since. Previously, I led neural search at Tencent between 2018-2020 and worked on search and recommendation systems at Zalando Research between 2014-2018, where I created Fashion-MNIST (11,000+ citations). I got my Ph.D. from TU Munich in 2014, focusing on adversarial and robust non-parametric Bayesian learning.

I’ve lived and worked in many places, including the San Francisco Bay Area, Berlin, Munich, Taipei, Beijing, and Shenzhen. I’m currently based in Mountain View.

Go west, young man!

Go West, young man!

Selected Publications

  • Efficient Code Embeddings from Code Generation Models, 2025.8 - arXiv
  • jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval, 2025.6 - arXiv
  • ReaderLM-v2: Small Language Model for HTML to Markdown and JSON, 2025.3 - ICLR 2025 SCI-FM Workshop
  • AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark, 2024.12 - ACL 2025
  • jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images, 2024.12 - ICLR 2025 SCI-FM Workshop
  • jina-embeddings-v3: Multilingual Embeddings With Task LoRA, 2024.9 - ECIR 2025 Industry Track
  • Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models, 2024.9 - SIGIR 2025 RobustIR Workshop
  • Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever, 2024.8 - EMNLP 2024 Multilingual Representation Learning Workshop
  • Jina CLIP: Your CLIP Model Is Also Your Text Retriever, 2024.5 - ICML 2024 MFM-EAI Workshop
  • Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings, 2024.2 - arXiv
  • Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, 2023.10 - arXiv
  • Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models, 2023.7 - EMNLP 2023 NLP-OSS Workshop
  • Dual ask-answer network for machine reading comprehension, 2018.9 - arXiv
  • Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, 2017 (> 11K Citations) - arXiv
  • Support vector machines under adversarial label contamination - Neurocomputing
  • Efficient Online Sequence Prediction with Side Information, 2013 - IEEE ICDM 2013
  • Lazy Gaussian Process Committee for Real-Time Online Regression, 2013 - AAAI 2013
  • Learning from Multiple Observers with Unknown Expertise, 2013 - PAKDD 2013
  • Adversarial Label Flips Attack on Support Vector Machines, 2012 - ECAI 2012
  • Evasion Attack on Multi-Class Linear Classifier, 2012 - PAKDD 2012
  • Supervised Topic Transition Model for Detecting Malicious System Call Sequences, 2011 - SIGKDD Workshop 2011 (Best paper award)
  • Toward Artificial Synesthesia: Linking Pictures and Sounds via Words, 2010 - NIPS Workshop 2010
  • Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation, 2010 - ACML 2010

Selected Media Coverages