Hugging Face Transformers: The Open-Source Backbone of Modern AI Only a handful of open-source projects have achieved the transformative impact of Hugging Face Transformers. With over 154,000 GitHub stars, more than 3,500 contributors, and a staggering 398,000+ dep... AI Deep Learning Hugging Face Machine Learning NLP Open Source Python Transformers
Sentence Transformers Joins Hugging Face: What This Means for NLP Innovation The Sentence Transformers library, a staple in natural language processing, is officially joining the Hugging Face ecosystem. This move marks a significant leap for both the tool and the community, as... community embeddings Hugging Face machine learning NLP open source semantic search Sentence Transformers
Dimension-Insensitive Metrics: DIEM vs. Cosine Similarity In High Dimensions The paper " Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric " interrogates a default choice in machine learning and information retrieval: cos... cosine similarity distance metrics embeddings euclidean distance NLP similarity
mmBERT: How Johns Hopkins Built a 1,833-Language AI That Outperforms XLM-R mmBERT: How Johns Hopkins Built a 1,833-Language AI That Outperforms XLM-R Imagine trying to build an AI system that truly understands human language not just in English, but in over 7,000 languages s... AI research annealed language learning cross-lingual digital inclusion encoder-only FlashAttention Gemma tokenizer GLUE inverse masking language model mmBERT ModernBERT MTEB multilingual NLP XLM-R XTREME
LangExtract: Grounded, Structured Extraction for Long Text LangExtract is a focused open-source library from Google that turns unstructured text into structured data you can trust. It combines schema-guided prompts, precise span alignment to the source text, ... Gemini information extraction langextract LLM NLP Ollama OpenAI plugins Python