Food Recognition Model
A Vision Transformer (ViT) fine-tuned for food recognition and classification. This model can identify 10 different types of food from images.
Model Description
This model is based on Google's Vision Transformer (ViT-Base) and has been fine-tuned on a custom food dataset. It can classify images into 10 different food categories with high accuracy.
Food Classes
The model can recognize the following food types:
- apple_pie
- caesar_salad
- chocolate_cake
- cup_cakes
- donuts
- hamburger
- ice_cream
- pancakes
- pizza
- waffles
Model Performance
- Accuracy: 68.0%
- F1 Score: 66.5%
- Precision: 68.2%
- Recall: 68.0%
Usage
Using the Pipeline
from transformers import pipeline
# Load the model
classifier = pipeline("image-classification", model="BinhQuocNguyen/food-recognition-vit")
# Classify an image
result = classifier("path/to/your/food_image.jpg")
print(result)
Using the Model Directly
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
processor = AutoImageProcessor.from_pretrained("BinhQuocNguyen/food-recognition-vit")
model = AutoModelForImageClassification.from_pretrained("BinhQuocNguyen/food-recognition-vit")
# Load and process image
image = Image.open("path/to/your/food_image.jpg")
inputs = processor(image, return_tensors="pt")
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get top prediction
predicted_class_id = predictions.argmax().item()
predicted_class = model.config.id2label[str(predicted_class_id)]
confidence = predictions[0][predicted_class_id].item()
print(f"Predicted: {predicted_class} ({confidence:.3f})")
Training Details
- Base Model: google/vit-base-patch16-224
- Training Framework: PyTorch with Transformers
- Dataset: Custom food recognition dataset
- Classes: 10 food categories
- Image Size: 224x224 pixels
- Training Time: ~84 minutes
Limitations
- The model is trained on a specific set of food categories and may not generalize well to other food types
- Performance may vary depending on image quality, lighting, and angle
- The model works best with clear, well-lit images of food
Citation
If you use this model in your research, please cite:
@misc{food-recognition-model,
title={Food Recognition Model},
author={BinhQuocNguyen},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/BinhQuocNguyen/food-recognition-vit}}
}
License
This model is released under the MIT License.
- Downloads last month
- 51