Community Blog & Articles

Community Articles

Introducing Falcon H1R 7B

about 19 hours ago

The Optimal Architecture for Small Language Models

M2.1: Multilingual and Multi-Task Coding with Strong Generalization

about 23 hours ago

Continuity as a First-Class System Property in Artificial Intelligence

Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model

Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR

about 7 hours ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

Uncensor any LLM with abliteration

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Deriving the DPO Loss from First Principles

Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot

about 6 hours ago

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU

TFLOPS Gap: Why FP4 MoE Kernel Engineering Matters on Blackwell

about 11 hours ago

Small Language Models (SLM): A Comprehensive Overview

Deriving the PPO Loss from First Principles

Building Autonomous Vehicles That Reason with the NVIDIA Alpamayo Open Ecosystem

about 6 hours ago

Code a simple RAG from scratch

Mastering Tensor Dimensions in Transformers

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Why Did MiniMax M2 End Up as a Full Attention Model?

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

September 14, 2021

Deep Learning over the Internet: Training Language Models Collaboratively

open-source-collabnlp

Welcome spaCy to the Hugging Face Hub

guidepartnershipsaws

Deploy Hugging Face models easily with Amazon SageMaker

open-source-collabnlp

Sentence Transformers in the Hugging Face Hub

Few-shot learning in practice: GPT-Neo and the 🤗 Accelerated Inference API

open-source-collabguide

Using & Mixing Hugging Face Models with Gradio 2.0

guidenlppartnerships

Scaling-up BERT Inference on CPU (Part 1)

Introducing 🤗 Accelerate

guidepartnershipsaws

Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker

communityresearchnlp

Understanding BigBird's Block Sparse Attention

partnershipsaws

The Partnership: Amazon SageMaker and Hugging Face

My Journey to a serverless transformers pipeline on Google Cloud

Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers

Community Articles

NEW Articles from Team or Enterprise organizations will get promoted to the main section.

Introducing Falcon H1R 7B

about 19 hours ago

The Optimal Architecture for Small Language Models

M2.1: Multilingual and Multi-Task Coding with Strong Generalization

about 23 hours ago

Continuity as a First-Class System Property in Artificial Intelligence

Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model

Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR

about 7 hours ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

Uncensor any LLM with abliteration

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Deriving the DPO Loss from First Principles

Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot

about 6 hours ago

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU

TFLOPS Gap: Why FP4 MoE Kernel Engineering Matters on Blackwell

about 11 hours ago

Small Language Models (SLM): A Comprehensive Overview

Deriving the PPO Loss from First Principles

Building Autonomous Vehicles That Reason with the NVIDIA Alpamayo Open Ecosystem

about 6 hours ago

Code a simple RAG from scratch

Mastering Tensor Dimensions in Transformers

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Why Did MiniMax M2 End Up as a Full Attention Model?

View all articles