Food Recognition Model

A Vision Transformer (ViT) fine-tuned for food recognition and classification. This model can identify 10 different types of food from images.

Model Description

This model is based on Google's Vision Transformer (ViT-Base) and has been fine-tuned on a custom food dataset. It can classify images into 10 different food categories with high accuracy.

Food Classes

The model can recognize the following food types:

  • apple_pie
  • caesar_salad
  • chocolate_cake
  • cup_cakes
  • donuts
  • hamburger
  • ice_cream
  • pancakes
  • pizza
  • waffles

Model Performance

  • Accuracy: 68.0%
  • F1 Score: 66.5%
  • Precision: 68.2%
  • Recall: 68.0%

Usage

Using the Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("image-classification", model="BinhQuocNguyen/food-recognition-vit")

# Classify an image
result = classifier("path/to/your/food_image.jpg")
print(result)

Using the Model Directly

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoImageProcessor.from_pretrained("BinhQuocNguyen/food-recognition-vit")
model = AutoModelForImageClassification.from_pretrained("BinhQuocNguyen/food-recognition-vit")

# Load and process image
image = Image.open("path/to/your/food_image.jpg")
inputs = processor(image, return_tensors="pt")

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get top prediction
predicted_class_id = predictions.argmax().item()
predicted_class = model.config.id2label[str(predicted_class_id)]
confidence = predictions[0][predicted_class_id].item()

print(f"Predicted: {predicted_class} ({confidence:.3f})")

Training Details

  • Base Model: google/vit-base-patch16-224
  • Training Framework: PyTorch with Transformers
  • Dataset: Custom food recognition dataset
  • Classes: 10 food categories
  • Image Size: 224x224 pixels
  • Training Time: ~84 minutes

Limitations

  • The model is trained on a specific set of food categories and may not generalize well to other food types
  • Performance may vary depending on image quality, lighting, and angle
  • The model works best with clear, well-lit images of food

Citation

If you use this model in your research, please cite:

@misc{food-recognition-model,
  title={Food Recognition Model},
  author={BinhQuocNguyen},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/BinhQuocNguyen/food-recognition-vit}}
}

License

This model is released under the MIT License.

Downloads last month
51
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using BinhQuocNguyen/food-recognition-vit 1