Upload trained food recognition model

Browse files

Files changed (6) hide show

README.md +117 -0
class_names.json +12 -0
config.json +49 -0
model.safetensors +3 -0
preprocessor_config.json +23 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,117 @@

+---
+license: mit
+tags:
+- food-recognition
+- computer-vision
+- image-classification
+- vit
+- pytorch
+pipeline_tag: image-classification
+library_name: transformers
+---
+# Food Recognition Model
+A Vision Transformer (ViT) fine-tuned for food recognition and classification. This model can identify 10 different types of food from images.
+## Model Description
+This model is based on Google's Vision Transformer (ViT-Base) and has been fine-tuned on a custom food dataset. It can classify images into 10 different food categories with high accuracy.
+## Food Classes
+The model can recognize the following food types:
+- apple_pie
+- caesar_salad
+- chocolate_cake
+- cup_cakes
+- donuts
+- hamburger
+- ice_cream
+- pancakes
+- pizza
+- waffles
+## Model Performance
+- **Accuracy**: 68.0%
+- **F1 Score**: 66.5%
+- **Precision**: 68.2%
+- **Recall**: 68.0%
+## Usage
+### Using the Pipeline
+```python
+from transformers import pipeline
+# Load the model
+classifier = pipeline("image-classification", model="BinhQuocNguyen/food-recognition-vit")
+# Classify an image
+result = classifier("path/to/your/food_image.jpg")
+print(result)
+```
+### Using the Model Directly
+```python
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+from PIL import Image
+import torch
+# Load model and processor
+processor = AutoImageProcessor.from_pretrained("BinhQuocNguyen/food-recognition-vit")
+model = AutoModelForImageClassification.from_pretrained("BinhQuocNguyen/food-recognition-vit")
+# Load and process image
+image = Image.open("path/to/your/food_image.jpg")
+inputs = processor(image, return_tensors="pt")
+# Get predictions
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+# Get top prediction
+predicted_class_id = predictions.argmax().item()
+predicted_class = model.config.id2label[str(predicted_class_id)]
+confidence = predictions[0][predicted_class_id].item()
+print(f"Predicted: {predicted_class} ({confidence:.3f})")
+```
+## Training Details
+- **Base Model**: google/vit-base-patch16-224
+- **Training Framework**: PyTorch with Transformers
+- **Dataset**: Custom food recognition dataset
+- **Classes**: 10 food categories
+- **Image Size**: 224x224 pixels
+- **Training Time**: ~84 minutes
+## Limitations
+- The model is trained on a specific set of food categories and may not generalize well to other food types
+- Performance may vary depending on image quality, lighting, and angle
+- The model works best with clear, well-lit images of food
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{food-recognition-model,
+  title={Food Recognition Model},
+  author={BinhQuocNguyen},
+  year={2025},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/BinhQuocNguyen/food-recognition-vit}}
+}
+```
+## License
+This model is released under the MIT License.

class_names.json ADDED Viewed

	@@ -0,0 +1,12 @@

+[
+  "apple_pie",
+  "caesar_salad",
+  "chocolate_cake",
+  "cup_cakes",
+  "donuts",
+  "hamburger",
+  "ice_cream",
+  "pancakes",
+  "pizza",
+  "waffles"
+]

config.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "dtype": "float32",
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1",
+    "2": "LABEL_2",
+    "3": "LABEL_3",
+    "4": "LABEL_4",
+    "5": "LABEL_5",
+    "6": "LABEL_6",
+    "7": "LABEL_7",
+    "8": "LABEL_8",
+    "9": "LABEL_9"
+  },
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_2": 2,
+    "LABEL_3": 3,
+    "LABEL_4": 4,
+    "LABEL_5": 5,
+    "LABEL_6": 6,
+    "LABEL_7": 7,
+    "LABEL_8": 8,
+    "LABEL_9": 9
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 12,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "pooler_act": "tanh",
+  "pooler_output_size": 768,
+  "problem_type": "single_label_classification",
+  "qkv_bias": true,
+  "transformers_version": "4.56.1"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54ef0f06c976cf494c8f75202425065d090b5a414c631c8f83679150cc4c3b0b
+size 343248584

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "ViTImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 224,
+    "width": 224
+  }
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a3bd658f2293292778a1c32cabbe0c350b857da52345f35972ebcee99564a46
+size 5777