Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1 • 647 • 117 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 72 • 4 RefineBench/RefineBench Viewer • Updated 18 days ago • 1k • 1.1k • 4
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 14 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 13 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 104 • 5
Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1 • 647 • 117 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 72 • 4 RefineBench/RefineBench Viewer • Updated 18 days ago • 1k • 1.1k • 4
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 14 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 13 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 104 • 5