AI & ML interests

Democratizing access to useful AI tools and resources for journalists

Recent Activity

hanzla 
posted an update about 1 month ago
view post
Post
202
Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training.

I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting.

Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images.

I wrote a short breakdown of the experiment here:
https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/
DmitryRyumin 
posted an update 7 months ago
view post
Post
1369
🚀👁️🌟 New Research Alert - ICCV 2025 (Poster)! 🌟👁️🚀
📄 Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation 🔝

📝 Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

👥 Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

📁 Repository: https://github.com/Jo-wang/TCA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
2501
🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching 🔝

📝 Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

👥 Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

📁 Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
2845
🚀👌🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤌🚀
📄 Title: Understanding Co-speech Gestures in-the-wild 🔝

📝 Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

👥 Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
📁 Repository: https://github.com/Sindhu-Hegde/jegal
📺 Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
3988
🚀💡🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🪄🚀
📄 Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models 🔝

📝 Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

👥 Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
📁 Repository: https://github.com/andrehuang/loftup

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 8 months ago
view post
Post
1970
🚀🏷️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🧩🚀
📄 Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening 🔝

📝 Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

👥 Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

📺 Video: https://www.youtube.com/watch?v=kAyK_3wskgA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
·
DmitryRyumin 
posted an update 8 months ago
view post
Post
4829
🚀🤖🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤖🚀
📄 Title: Variance-based Pruning for Accelerating and Compressing Trained Networks 🔝

📝 Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

👥 Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
gokaygokay 
posted an update 8 months ago
view post
Post
8500
FlashPack: Lightning-Fast Model Loading for PyTorch

https://github.com/fal-ai/flashpack

FlashPack — a new, high-throughput file format and loading mechanism for PyTorch that makes model checkpoint I/O blazingly fast, even on systems without access to GPU Direct Storage (GDS).

With FlashPack, loading any model can be 3–6× faster than with the current state-of-the-art methods like accelerate or the standard load_state_dict() and to() flow — all wrapped in a lightweight, pure-Python package that works anywhere.

  • 2 replies
·
DmitryRyumin 
posted an update 8 months ago
view post
Post
3038
🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Token Activation Map to Visually Explain Multimodal LLMs 🔝

📝 Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

👥 Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

📁 Repository: https://github.com/xmed-lab/TAM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 2 replies
·
giadap 
posted an update 8 months ago
view post
Post
4651
🌎 AI ethics and sustainability are two sides of the same coin.

In our new blog post with Dr. Sasha Luccioni, we argue that separating them (as is too often the case) means missing the bigger picture of how AI systems impact both people and the planet.

Ethical and sustainable AI development can’t be pursued in isolation. The same choices that affect who benefits or is harmed by AI systems also determine how much energy and resources they consume.

We explore how two key concepts, evaluation and transparency, can serve as bridges between these domains:

📊 Evaluation, by moving beyond accuracy or performance metrics to include environmental and social costs, as we’ve done with tools like the AI Energy Score.

🔍 Transparency, by enabling reproducibility, accountability, and environmental reporting through open tools like the Environmental Transparency Space.

AI systems mirror our priorities. If we separate ethics from sustainability, we risk building technologies that are efficient but unjust, or fair but unsustainable.

Read our blog post here: https://huggingface.co/blog/sasha/ethics-sustainability

AIEnergyScore/Leaderboard
sasha/environmental-transparency
  • 1 reply
·
giadap 
posted an update 8 months ago
view post
Post
11134
One of the hardest challenges in AI safety is finding the right balance: how do we protect people from harm without undermining their agency? This tension is especially visible in conversational systems, where safeguards can sometimes feel more paternalistic than supportive.

In my latest piece for Hugging Face, I argue that open source and community-driven approaches offer a promising (though not exclusive) way forward.

✨ Transparency can make safety mechanisms into learning opportunities.
✨ Collaboration with diverse communities makes safeguards more relevant across contexts.
✨ Iteration in the open lets protections evolve rather than freeze into rigid, one-size-fits-all rules.

Of course, this isn’t a silver bullet. Top-down safety measures will still be necessary in some cases. But if we only rely on corporate control, we risk building systems that are safe at the expense of trust and autonomy.

Read the blog post here: https://huggingface.co/blog/giadap/preserving-agency
  • 8 replies
·