@orasul on Hugging Face: "Open-source AI agents are achieving State-of-the-Art results on two different…"

Post

454

Open-source AI agents are achieving State-of-the-Art results on two different Android AI agent benchmarks.

Yesterday, I finished evaluating my Android agent model, deki, on two separate benchmarks: Android Control and Android World. For both benchmarks I used a subset of the dataset without fine-tuning. The results show that image description models like deki enables large LLMs (like GPT-4o, GPT-4.1, and Gemini 2.5) to become State-of-the-Art on Android AI agent benchmarks using only vision capabilities, without relying on Accessibility Trees, on both single-step and multi-step tasks.

All the information is available on GitHub: https://github.com/RasulOs/deki

I have also uploaded the model to Hugging Face:

Space: orasul/deki
(Check the analyze-and-get-yolo endpoint)

Model: orasul/deki-yolo

Join the conversation