Mwangi PRO

Benson

AI & ML interests

None yet

Recent Activity

liked a model 6 days ago

GAIR/LiveTalk-1.3B-V0.1

liked a dataset 9 days ago

kahrendt/microwakeword

liked a dataset 13 days ago

SparkAudio/voxbox

View all activity

Organizations

None yet

upvoted 2 articles 13 days ago

Article

How to make NeuTTS-air generate over 200 seconds of audio in a single second.

Nov 21, 2025

•

Article

LLM based Audio models

18 days ago

•

upvoted a paper 19 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published 21 days ago • 63

upvoted a paper 21 days ago

Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs

Paper • 2506.12509 • Published Jun 14, 2025 • 2

upvoted a paper 27 days ago

Scaling Zero-Shot Reference-to-Video Generation

Paper • 2512.06905 • Published 29 days ago • 28

upvoted a paper 28 days ago

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 167

upvoted a paper 29 days ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 67

upvoted 3 papers about 1 month ago

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published Dec 1, 2025 • 70

Phi-4-reasoning Technical Report

Paper • 2504.21318 • Published Apr 30, 2025 • 53

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112

upvoted 2 papers about 2 months ago

ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

Paper • 2511.14349 • Published Nov 18, 2025 • 17

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

upvoted 3 papers 2 months ago

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Paper • 2511.01163 • Published Nov 3, 2025 • 31

World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28, 2025 • 40

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

Paper • 2510.23691 • Published Oct 27, 2025 • 53

upvoted 2 collections 2 months ago

Gauss Gym Datasets

Collection

Datasets used for the gauss gym photorealistic simulator • 4 items • Updated Oct 17, 2025 • 8

Qwen3-Omni

Collection

6 items • Updated 5 days ago • 177

upvoted an article 2 months ago

Article

VR Forklift Simulation Data for RLHF - Skills Model and Indicators

Oct 2, 2025

•

upvoted a paper 3 months ago

FlashWorld: High-quality 3D Scene Generation within Seconds

Paper • 2510.13678 • Published Oct 15, 2025 • 72

upvoted an article 3 months ago

Article

Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset

Jun 25, 2025

•

Mwangi PRO

AI & ML interests

Recent Activity

Organizations

Benson's activity

How to make NeuTTS-air generate over 200 seconds of audio in a single second.

LLM based Audio models

VR Forklift Simulation Data for RLHF - Skills Model and Indicators

Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset