Daniel van Strien's picture

Building on HF

Daniel van Strien PRO

davanstrien

huggingface

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 1 hour ago

data-is-better-together/fineweb-c-progress

updated a dataset about 2 hours ago

uv-scripts/training

liked a model about 3 hours ago

Kwai-Keye/Keye-VL-1_5-8B

View all activity

Organizations

upvoted a paper about 4 hours ago

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Paper • 2512.14698 • Published Dec 16, 2025 • 20

upvoted a collection 4 days ago

TranslateGemma

3 items • Updated 4 days ago • 152

upvoted a paper 10 days ago

Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models

Paper • 2511.19526 • Published Nov 24, 2025 • 2

upvoted a collection 11 days ago

Qwen3-VL-Embedding

2 items • Updated 11 days ago • 55

upvoted 3 articles 13 days ago

Article

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

+1

Mar 22, 2024

•

123

Article

Why We Built VIBE Bench: Rethinking Evaluation for Real Workloads

13 days ago

•

6

Article

Diversity Vs Density: A data strategy comparison for fine-tuning VLMs

14 days ago

•

5

upvoted an article about 1 month ago

Article

Shadow AI - Where are the CIOs?

Dec 19, 2025

•

31

upvoted 2 collections about 1 month ago

SauerkrautLM-Vision-Document-Retrieval

7 items • Updated Dec 15, 2025 • 9

GLM-V

4 items • Updated Dec 17, 2025 • 11

upvoted 3 papers about 1 month ago

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Paper • 2509.19768 • Published Sep 24, 2025 • 5

Metadata Extraction Leveraging Large Language Models

Paper • 2510.19334 • Published Oct 22, 2025 • 1

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published Dec 15, 2025 • 14

upvoted 7 collections about 1 month ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 1

Molmo2 Data

Artifacts for the Molmo2 data release • 16 items • Updated 27 days ago • 33

Molmo2

Artifacts for the Molmo2 release • 6 items • Updated 27 days ago • 30

Datasets Wrapped 2025: Reasoning

The reasoning datasets that defined 2025. Part 1 of Datasets Wrapped 2025. #DatasetsWrapped2025 • 20 items • Updated Dec 16, 2025 • 1

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 13 items • Updated 3 days ago • 37

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano v3. • 8 items • Updated 3 days ago • 56

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 7 items • Updated 3 days ago • 121