kernels-community

Team

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

kernels-bot updated a model 3 days ago

kernels-community/flash-attn3

kernels-bot updated a model 6 days ago

kernels-community/flash-attn2

kernels-bot updated a model 7 days ago

kernels-community/aiter-rope

View all activity

Team members 4
private

kernels-bot

updated a model 3 days ago

kernels-community/flash-attn3

Updated 3 days ago • 282k • 48

kernels-bot

updated a model 6 days ago

kernels-community/flash-attn2

Updated 6 days ago • 23k • 33

kernels-bot

updated a model 7 days ago

kernels-community/aiter-rope

Updated 7 days ago • 185 • 1

kernels-bot

updated a model 9 days ago

kernels-community/aiter-kernels

Updated 9 days ago • 50 • 1

sayakpaul

in kernels-community/README 11 days ago

How to get access to publish kernels on kernel hub

#7 opened about 1 month ago by

chauhang

kernels-bot

published a model 11 days ago

kernels-community/aiter-kernels

Updated 9 days ago • 50 • 1

danieldk

posted an update about 1 month ago

Post

252

Two large changes in kernel-builder this week:

kernel-builder now links libstdc++ dynamically. To support a wide range of systems, we build against libstdc++ from manylinux_2_28 (EL 8 and later).

Following our Torch support policy that the current and previous Torch versions are supported, Torch 2.10 support was removed. We will soon also support the Torch stable ABI, so that it is possible to write kernels that support a large number of Torch versions.

sayakpaul

authored 2 papers 4 months ago

Fine-Grained Perturbation Guidance via Attention Head Selection

Paper • 2506.10978 • Published Jun 12, 2025 • 25

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Paper • 2602.21778 • Published Feb 25 • 15

sayakpaul

submitted a paper to Daily Papers 4 months ago

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Paper • 2602.21778 • Published Feb 25 • 15

sayakpaul

authored a paper 4 months ago

TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

Paper • 2602.15449 • Published Feb 17 • 7

danieldk

posted an update 5 months ago

Post

2842

kernels 0.12 is out! 🎉

Changes:

* Support for kernel version branches to gracefully roll out kernel API changes.
* Support for PyTorch 2.10.
* kernel-builder is now merged into the kernels repo.
* Initial support for standardized kernel benchmarks.

https://github.com/huggingface/kernels/releases/tag/v0.12.0

danieldk

posted an update 8 months ago

Post

557

We have released kernel-builder 0.7.0: https://github.com/huggingface/kernel-builder/releases/tag/v0.7.0

Headline features:

* 🔮 Supports building kernels for the brand-new PyTorch 2.9.0.
* 🪟 Experimental support for building Windows kernels.

sayakpaul

authored a paper 9 months ago

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6, 2025 • 20

sayakpaul

posted an update 11 months ago

Post

3117

Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast

3 replies

danieldk

posted an update 12 months ago

Post

2074

kernels 0.8.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.8.0

This release refines kernel selection in the kernelize function:

• You can now register kernels for certain CUDA capability ranges.
• Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.

1 reply

danieldk

posted an update 12 months ago

Post

493

You can get flash-attention 3 ⚡️ directly from the hub now using kernels!

kernels-community/flash-attn3

danieldk

posted an update 12 months ago

Post

390

Kernels 0.7.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.7.0 🚀

This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.

danieldk

posted an update about 1 year ago

Post

2023

We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0

2 replies

sayakpaul

posted an update about 1 year ago

Post

3038

Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2

2 replies

AI & ML interests

Recent Activity

Team members 4 private

kernels-community's activity

How to get access to publish kernels on kernel hub

Team members 4
private