Food Recognition Model

A Vision Transformer (ViT) fine-tuned for food recognition and classification. This model can identify 10 different types of food from images.

Model Description

This model is based on Google's Vision Transformer (ViT-Base) and has been fine-tuned on a custom food dataset. It can classify images into 10 different food categories with high accuracy.

Food Classes

The model can recognize the following food types:

apple_pie
caesar_salad
chocolate_cake
cup_cakes
donuts
hamburger
ice_cream
pancakes
pizza
waffles

Model Performance

Accuracy: 68.0%
F1 Score: 66.5%
Precision: 68.2%
Recall: 68.0%

Usage

Using the Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("image-classification", model="BinhQuocNguyen/food-recognition-vit")

# Classify an image
result = classifier("path/to/your/food_image.jpg")
print(result)

Using the Model Directly

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoImageProcessor.from_pretrained("BinhQuocNguyen/food-recognition-vit")
model = AutoModelForImageClassification.from_pretrained("BinhQuocNguyen/food-recognition-vit")

# Load and process image
image = Image.open("path/to/your/food_image.jpg")
inputs = processor(image, return_tensors="pt")

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get top prediction
predicted_class_id = predictions.argmax().item()
predicted_class = model.config.id2label[str(predicted_class_id)]
confidence = predictions[0][predicted_class_id].item()

print(f"Predicted: {predicted_class} ({confidence:.3f})")

Training Details

Base Model: google/vit-base-patch16-224
Training Framework: PyTorch with Transformers
Dataset: Custom food recognition dataset
Classes: 10 food categories
Image Size: 224x224 pixels
Training Time: ~84 minutes

Limitations

The model is trained on a specific set of food categories and may not generalize well to other food types
Performance may vary depending on image quality, lighting, and angle
The model works best with clear, well-lit images of food

Citation

If you use this model in your research, please cite:

@misc{food-recognition-model,
  title={Food Recognition Model},
  author={BinhQuocNguyen},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/BinhQuocNguyen/food-recognition-vit}}
}

License

This model is released under the MIT License.

Downloads last month: 51

Safetensors

Model size

85.8M params

Tensor type

F32

BinhQuocNguyen
/

food-recognition-vit