BinhQuocNguyen commited on
Commit
641c6e7
·
verified ·
1 Parent(s): 8492b81

Upload trained food recognition model

Browse files
Files changed (6) hide show
  1. README.md +117 -0
  2. class_names.json +12 -0
  3. config.json +49 -0
  4. model.safetensors +3 -0
  5. preprocessor_config.json +23 -0
  6. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - food-recognition
5
+ - computer-vision
6
+ - image-classification
7
+ - vit
8
+ - pytorch
9
+ pipeline_tag: image-classification
10
+ library_name: transformers
11
+ ---
12
+
13
+ # Food Recognition Model
14
+
15
+ A Vision Transformer (ViT) fine-tuned for food recognition and classification. This model can identify 10 different types of food from images.
16
+
17
+ ## Model Description
18
+
19
+ This model is based on Google's Vision Transformer (ViT-Base) and has been fine-tuned on a custom food dataset. It can classify images into 10 different food categories with high accuracy.
20
+
21
+ ## Food Classes
22
+
23
+ The model can recognize the following food types:
24
+
25
+ - apple_pie
26
+ - caesar_salad
27
+ - chocolate_cake
28
+ - cup_cakes
29
+ - donuts
30
+ - hamburger
31
+ - ice_cream
32
+ - pancakes
33
+ - pizza
34
+ - waffles
35
+
36
+ ## Model Performance
37
+
38
+ - **Accuracy**: 68.0%
39
+ - **F1 Score**: 66.5%
40
+ - **Precision**: 68.2%
41
+ - **Recall**: 68.0%
42
+
43
+ ## Usage
44
+
45
+ ### Using the Pipeline
46
+
47
+ ```python
48
+ from transformers import pipeline
49
+
50
+ # Load the model
51
+ classifier = pipeline("image-classification", model="BinhQuocNguyen/food-recognition-vit")
52
+
53
+ # Classify an image
54
+ result = classifier("path/to/your/food_image.jpg")
55
+ print(result)
56
+ ```
57
+
58
+ ### Using the Model Directly
59
+
60
+ ```python
61
+ from transformers import AutoImageProcessor, AutoModelForImageClassification
62
+ from PIL import Image
63
+ import torch
64
+
65
+ # Load model and processor
66
+ processor = AutoImageProcessor.from_pretrained("BinhQuocNguyen/food-recognition-vit")
67
+ model = AutoModelForImageClassification.from_pretrained("BinhQuocNguyen/food-recognition-vit")
68
+
69
+ # Load and process image
70
+ image = Image.open("path/to/your/food_image.jpg")
71
+ inputs = processor(image, return_tensors="pt")
72
+
73
+ # Get predictions
74
+ with torch.no_grad():
75
+ outputs = model(**inputs)
76
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
77
+
78
+ # Get top prediction
79
+ predicted_class_id = predictions.argmax().item()
80
+ predicted_class = model.config.id2label[str(predicted_class_id)]
81
+ confidence = predictions[0][predicted_class_id].item()
82
+
83
+ print(f"Predicted: {predicted_class} ({confidence:.3f})")
84
+ ```
85
+
86
+ ## Training Details
87
+
88
+ - **Base Model**: google/vit-base-patch16-224
89
+ - **Training Framework**: PyTorch with Transformers
90
+ - **Dataset**: Custom food recognition dataset
91
+ - **Classes**: 10 food categories
92
+ - **Image Size**: 224x224 pixels
93
+ - **Training Time**: ~84 minutes
94
+
95
+ ## Limitations
96
+
97
+ - The model is trained on a specific set of food categories and may not generalize well to other food types
98
+ - Performance may vary depending on image quality, lighting, and angle
99
+ - The model works best with clear, well-lit images of food
100
+
101
+ ## Citation
102
+
103
+ If you use this model in your research, please cite:
104
+
105
+ ```bibtex
106
+ @misc{food-recognition-model,
107
+ title={Food Recognition Model},
108
+ author={BinhQuocNguyen},
109
+ year={2025},
110
+ publisher={Hugging Face},
111
+ howpublished={\url{https://huggingface.co/BinhQuocNguyen/food-recognition-vit}}
112
+ }
113
+ ```
114
+
115
+ ## License
116
+
117
+ This model is released under the MIT License.
class_names.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ "apple_pie",
3
+ "caesar_salad",
4
+ "chocolate_cake",
5
+ "cup_cakes",
6
+ "donuts",
7
+ "hamburger",
8
+ "ice_cream",
9
+ "pancakes",
10
+ "pizza",
11
+ "waffles"
12
+ ]
config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ViTForImageClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "dtype": "float32",
7
+ "encoder_stride": 16,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.0,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0",
13
+ "1": "LABEL_1",
14
+ "2": "LABEL_2",
15
+ "3": "LABEL_3",
16
+ "4": "LABEL_4",
17
+ "5": "LABEL_5",
18
+ "6": "LABEL_6",
19
+ "7": "LABEL_7",
20
+ "8": "LABEL_8",
21
+ "9": "LABEL_9"
22
+ },
23
+ "image_size": 224,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 3072,
26
+ "label2id": {
27
+ "LABEL_0": 0,
28
+ "LABEL_1": 1,
29
+ "LABEL_2": 2,
30
+ "LABEL_3": 3,
31
+ "LABEL_4": 4,
32
+ "LABEL_5": 5,
33
+ "LABEL_6": 6,
34
+ "LABEL_7": 7,
35
+ "LABEL_8": 8,
36
+ "LABEL_9": 9
37
+ },
38
+ "layer_norm_eps": 1e-12,
39
+ "model_type": "vit",
40
+ "num_attention_heads": 12,
41
+ "num_channels": 3,
42
+ "num_hidden_layers": 12,
43
+ "patch_size": 16,
44
+ "pooler_act": "tanh",
45
+ "pooler_output_size": 768,
46
+ "problem_type": "single_label_classification",
47
+ "qkv_bias": true,
48
+ "transformers_version": "4.56.1"
49
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54ef0f06c976cf494c8f75202425065d090b5a414c631c8f83679150cc4c3b0b
3
+ size 343248584
preprocessor_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": null,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.5,
8
+ 0.5,
9
+ 0.5
10
+ ],
11
+ "image_processor_type": "ViTImageProcessor",
12
+ "image_std": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "resample": 2,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "height": 224,
21
+ "width": 224
22
+ }
23
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a3bd658f2293292778a1c32cabbe0c350b857da52345f35972ebcee99564a46
3
+ size 5777