Thuong Nguyen commited on
Commit
1416389
Β·
1 Parent(s): f3cb94f

Fix README.md with proper HF Space YAML frontmatter

Browse files
Files changed (1) hide show
  1. README.md +50 -320
README.md CHANGED
@@ -1,347 +1,77 @@
1
- # 🌿 Plant Recognition with Q&A System - Backend
2
-
3
- FastAPI backend for Vietnamese plant recognition and Q&A using RAG (Retrieval-Augmented Generation) with OG-RAG hypergraph architecture.
4
-
5
- ## 🎯 Features
6
-
7
- - **Flow 1:** Image-only plant classification (Top-5 predictions)
8
- - **Flow 2:** Image + Question (Plant identification β†’ Contextual Q&A)
9
- - **Flow 3:** Text-only Q&A (Pure RAG with Vietnamese embeddings)
10
-
11
- ## πŸ—οΈ Tech Stack
12
-
13
- - **API:** FastAPI + Uvicorn
14
- - **Database:** Supabase (PostgreSQL + pgvector)
15
- - **Embeddings:** Vietnamese-Embedding (1024-dim)
16
- - **LLM:** MegaLLM API (qwen/qwen3-next-80b-a3b-instruct)
17
- - **CV Model:** Plant Classification API
18
- - **Architecture:** OG-RAG Hypergraph (9,954 nodes, 1,305 plants)
19
-
20
  ---
21
-
22
- ## πŸš€ Quick Start
23
-
24
- ### 1. Prerequisites
25
-
26
- - Python 3.9+
27
- - Supabase account
28
- - MegaLLM API key
29
- - Plant Classification API endpoint
30
-
31
- ### 2. Installation
32
-
33
- ```bash
34
- # Clone repository
35
- git clone https://github.com/thuonguyenvan/Plant-Recognition-with-Q-A-System-Backend.git
36
- cd Plant-Recognition-with-Q-A-System-Backend
37
-
38
- # Create virtual environment
39
- python -m venv venv
40
- source venv/bin/activate # On Windows: venv\Scripts\activate
41
-
42
- # Install dependencies
43
- pip install -r requirements.txt
44
- ```
45
-
46
- ### 3. Environment Setup
47
-
48
- ```bash
49
- # Copy environment template
50
- cp .env.example .env
51
-
52
- # Edit .env with your credentials
53
- nano .env
54
- ```
55
-
56
- **Required environment variables:**
57
-
58
- ```bash
59
- # Supabase
60
- SUPABASE_URL=https://your-project.supabase.co
61
- SUPABASE_ANON_KEY=your_anon_key
62
-
63
- # MegaLLM
64
- MEGLLM_API_KEY=your_megallm_api_key
65
-
66
- # Computer Vision API (optional - has default)
67
- CV_API_URL=https://your-cv-api-endpoint
68
-
69
- # Optional: Direct DB connection for data import scripts
70
- SUPABASE_DB_URI=postgresql://postgres.[REF]:[PASSWORD]@aws-0-[REGION].pooler.supabase.com:6543/postgres
71
- ```
72
-
73
- > **Note:** `EMBEDDING_MODEL_NAME` has a default value (`AITeamVN/Vietnamese_Embedding`) and doesn't need to be set unless you want to use a different model.
74
-
75
- ### 4. Database Setup
76
-
77
- Run the SQL setup script in your Supabase SQL Editor:
78
-
79
- ```bash
80
- # Copy content from set_up_supabasedb.sql
81
- # Paste and run in: https://app.supabase.com/project/_/sql
82
- ```
83
-
84
- ### 5. Import Data (Optional)
85
-
86
- If you have the data files:
87
-
88
- ```bash
89
- # Import hypernodes with embeddings
90
- python scripts/fast_import.py --embeddings plant_hypernodes_with_embeddings.json
91
- ```
92
-
93
- > **Note:** Large data files are not included in this repository. Contact maintainer for access.
94
-
95
- ### 6. Run Server
96
-
97
- ```bash
98
- # Development mode
99
- uvicorn main:app --reload --host 0.0.0.0 --port 8000
100
-
101
- # Or using Python
102
- python main.py
103
- ```
104
-
105
- Server will start at: **http://localhost:8000**
106
-
107
  ---
108
 
109
- ## πŸ“‘ API Endpoints
110
-
111
- ### Health Check
112
-
113
- ```bash
114
- GET /health
115
- ```
116
-
117
- ### Flow 1: Image Classification
118
-
119
- ```bash
120
- # Upload image file
121
- POST /api/flow1/classify
122
- Content-Type: multipart/form-data
123
- Body: file=<image>
124
-
125
- # Or use image URL
126
- POST /api/flow1/classify-url
127
- Content-Type: application/json
128
- Body: {"image_url": "https://..."}
129
-
130
- # Get plant details
131
- GET /api/flow1/detail/{plant_name}
132
- ```
133
-
134
- ### Flow 2: Image + Question
135
 
136
- ```bash
137
- # Upload image + question
138
- POST /api/flow2/identify
139
- Content-Type: multipart/form-data
140
- Body: file=<image>
141
 
142
- # Then ask question about identified plant
143
- POST /api/flow2/ask
144
- Content-Type: application/json
145
- Body: {
146
- "question": "CΓ’y nΓ y cΓ³ tΓ‘c dα»₯ng gΓ¬?",
147
- "plant_name": "SΓ’m cau"
148
- }
149
- ```
150
 
151
- ### Flow 3: Text Q&A (RAG)
152
 
153
- ```bash
154
- POST /api/flow3/ask
155
- Content-Type: application/json
156
- Body: {
157
- "question": "CΓ’y nΓ o chα»―a ho?",
158
- "top_k": 10
159
- }
160
- ```
161
 
162
- ---
163
-
164
- ## πŸ§ͺ Testing
165
-
166
- ```bash
167
- # Test health endpoint
168
- curl http://localhost:8000/health
169
 
170
- # Test Flow 3 (RAG)
171
- curl -X POST http://localhost:8000/api/flow3/ask \
172
- -H "Content-Type: application/json" \
173
- -d '{"question": "SΓ’m cau cΓ³ tΓ‘c dα»₯ng gΓ¬?"}'
174
-
175
- # Test Flow 1 (Classification)
176
- curl -X POST http://localhost:8000/api/flow1/classify \
177
- -F "file=@path/to/plant_image.jpg"
178
- ```
179
-
180
- ---
181
 
182
- ## πŸ“ Project Structure
183
 
184
- ```
185
- Plant-Recognition-with-Q-A-System-Backend/
186
- β”œβ”€β”€ main.py # FastAPI application entry point
187
- β”œβ”€β”€ config.py # Configuration settings
188
- β”œβ”€β”€ requirements.txt # Python dependencies
189
- β”œβ”€β”€ .env.example # Environment template
190
- β”œβ”€β”€ set_up_supabasedb.sql # Database setup script
191
- β”‚
192
- β”œβ”€β”€ services/ # Core business logic
193
- β”‚ β”œβ”€β”€ cv_api_client.py # Plant classification API client
194
- β”‚ β”œβ”€β”€ embedding_service.py # Vietnamese embedding service
195
- β”‚ β”œβ”€β”€ llm_client.py # Groq LLM client
196
- β”‚ β”œβ”€β”€ vector_db_service.py # Supabase vector operations
197
- β”‚ β”œβ”€β”€ ograg_engine.py # OG-RAG hypergraph engine
198
- β”‚ β”œβ”€β”€ query_reformulator.py # Query enhancement
199
- β”‚ β”œβ”€β”€ flow1_service.py # Image classification flow
200
- β”‚ β”œβ”€β”€ flow2_service.py # Image + Q&A flow
201
- β”‚ └── flow3_service.py # Text Q&A flow
202
- β”‚
203
- β”œβ”€β”€ utils/ # Utility modules
204
- β”‚ β”œβ”€β”€ data_loader.py # JSON-LD ontology loader
205
- β”‚ β”œβ”€β”€ key_normalizer.py # Attribute name mapping
206
- β”‚ └── chunker.py # Text chunking utilities
207
- β”‚
208
- β”œβ”€β”€ scripts/ # Data processing scripts
209
- β”‚ β”œβ”€β”€ flatten_ontology.py # Convert JSON-LD to facts
210
- β”‚ β”œβ”€β”€ build_hypergraph.py # Build hypergraph structure
211
- β”‚ β”œβ”€β”€ import_embeddings.py # Generate embeddings
212
- β”‚ β”œβ”€β”€ fast_import.py # Import to Supabase
213
- β”‚ └── clean_duplicates.py # Remove duplicate nodes
214
- β”‚
215
- └── tests/ # Test files
216
- β”œβ”€β”€ test_connection.py # Database connection tests
217
- └── test_hypergraph.py # Hypergraph tests
218
- ```
219
-
220
- ---
221
 
222
  ## πŸ”§ Configuration
223
 
224
- ### Vector Search Settings
225
 
226
- Default settings in `config.py`:
227
 
228
- ```python
229
- VECTOR_SEARCH_TOP_K = 10
230
- VECTOR_SEARCH_THRESHOLD = 0.4
231
- VECTOR_SEARCH_TIMEOUT = 120
232
- ```
233
 
234
- ### LLM Settings
235
 
236
- ```python
237
- LLM_MODEL = "qwen/qwen3-next-80b-a3b-instruct"
238
- LLM_BASE_URL = "https://ai.megallm.io/v1"
239
- LLM_TEMPERATURE = 0.7
240
- LLM_MAX_TOKENS = 2000
241
- ```
242
 
243
- ---
244
 
245
- ## πŸ“Š Database Schema
 
 
 
246
 
247
- ### Hypernodes Table
248
 
249
- ```sql
250
- CREATE TABLE hypernodes (
251
- id BIGSERIAL PRIMARY KEY,
252
- key TEXT NOT NULL,
253
- value TEXT NOT NULL,
254
- key_embedding vector(1024),
255
- value_embedding vector(1024),
256
- plant_name TEXT NOT NULL,
257
- section TEXT,
258
- chunk_id INTEGER DEFAULT 0,
259
- is_chunked BOOLEAN DEFAULT FALSE,
260
- created_at TIMESTAMP DEFAULT NOW(),
261
- updated_at TIMESTAMP DEFAULT NOW()
262
- );
263
- ```
264
-
265
- ---
266
-
267
- ## πŸ› Troubleshooting
268
-
269
- ### Database Connection Issues
270
-
271
- ```bash
272
- # Check Supabase project is not paused
273
- # Verify SUPABASE_URL and SUPABASE_KEY in .env
274
- # Test connection:
275
- python tests/test_connection.py
276
- ```
277
-
278
- ### Vector Search Timeout
279
-
280
- ```bash
281
- # Reduce top_k in request
282
- # Increase threshold (0.5 instead of 0.4)
283
- # Check Supabase free tier limits
284
- ```
285
-
286
- ### Import Errors
287
-
288
- ```bash
289
- # Ensure python-dotenv is installed
290
- pip install python-dotenv
291
-
292
- # Check .env file exists and has correct format
293
- ```
294
-
295
- ---
296
-
297
- ## πŸ“š Documentation
298
-
299
- - **API Docs:** http://localhost:8000/docs (Swagger UI)
300
- - **ReDoc:** http://localhost:8000/redoc
301
- - **CV API Docs:** See `CV_API_DOCS.md`
302
- - **Flow 2 API:** See `FLOW2_API.md`
303
- - **Kaggle Embedding Guide:** See `KAGGLE_EMBEDDING_GUIDE.md`
304
-
305
- ---
306
-
307
- ## 🀝 Contributing
308
-
309
- 1. Fork the repository
310
- 2. Create feature branch (`git checkout -b feature/amazing-feature`)
311
- 3. Commit changes (`git commit -m 'Add amazing feature'`)
312
- 4. Push to branch (`git push origin feature/amazing-feature`)
313
- 5. Open Pull Request
314
-
315
- ---
316
-
317
- ## πŸ“„ License
318
-
319
- This project is licensed under the MIT License.
320
-
321
- ---
322
-
323
- ## πŸ‘₯ Authors
324
-
325
- - **Thuong Nguyen Van** - [@thuonguyenvan](https://github.com/thuonguyenvan)
326
-
327
- ---
328
-
329
- ## πŸ™ Acknowledgments
330
-
331
- - **OG-RAG Paper:** [Ontology-Grounded RAG](https://arxiv.org/html/2412.15235v1)
332
- - **Vietnamese Embedding:** AITeamVN/Vietnamese_Embedding
333
- - **Supabase:** Vector database with pgvector
334
- - **MegaLLM:** OpenAI-compatible LLM API
335
-
336
- ---
337
 
338
- ## πŸ“ž Support
339
 
340
- For issues and questions:
341
- - Open an issue on GitHub
342
- - Email: [email protected]
 
343
 
344
  ---
345
 
346
- **Status:** βœ… Production Ready
347
- **Last Updated:** November 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Plant Recognition with Q&A System Backend
3
+ emoji: 🌿
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
+ # 🌿 Plant Recognition with Q&A System Backend
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
+ Vietnamese medicinal plant recognition and Q&A system powered by RAG (Retrieval-Augmented Generation).
 
 
 
 
14
 
15
+ ## πŸš€ Features
 
 
 
 
 
 
 
16
 
17
+ This API provides 3 intelligent flows:
18
 
19
+ ### Flow 1: Image-Only Classification
20
+ - Upload plant image β†’ Get top-5 predictions with detailed information
21
+ - Endpoint: `POST /api/flow1/classify`
 
 
 
 
 
22
 
23
+ ### Flow 2: Image + Text Q&A (Two-Step)
24
+ - **Step 1**: Upload image β†’ Get plant predictions
25
+ - Endpoint: `POST /api/flow2/identify`
26
+ - **Step 2**: Select plant and ask questions
27
+ - Endpoint: `POST /api/flow2/ask`
 
 
28
 
29
+ ### Flow 3: Pure Text Q&A
30
+ - Ask questions without images using RAG
31
+ - Endpoint: `POST /api/flow3/ask`
 
 
 
 
 
 
 
 
32
 
33
+ ## πŸ“– API Documentation
34
 
35
+ Visit the interactive API documentation:
36
+ - Swagger UI: `/docs`
37
+ - ReDoc: `/redoc`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## πŸ”§ Configuration
40
 
41
+ This Space requires the following **Secrets** to be configured in Settings:
42
 
43
+ ### Required Environment Variables
44
 
45
+ 1. **SUPABASE_URL** - Your Supabase project URL
46
+ 2. **SUPABASE_ANON_KEY** - Supabase anonymous key
47
+ 3. **MEGLLM_API_KEY** - MegaLLM API key for LLM
48
+ 4. **CV_API_URL** - Plant classification model API URL
 
49
 
50
+ ### How to Configure
51
 
52
+ 1. Go to **Settings** β†’ **Variables and Secrets**
53
+ 2. Add each secret with the corresponding value
54
+ 3. Click **Apply** to restart the Space
 
 
 
55
 
56
+ ## πŸ—‚οΈ Data
57
 
58
+ This deployment includes:
59
+ - **1,311 plant ontology files** (JSON-LD format)
60
+ - **Plant reference photos**
61
+ - **~156MB total data** bundled in Docker image
62
 
63
+ ## πŸ”— Links
64
 
65
+ - **GitHub**: [Plant-Recognition-Backend](https://github.com/thuonguyenvan/Plant-Recognition-with-Q-A-System-Backend)
66
+ - **CV Model**: [Plants Classify](https://huggingface.co/spaces/thuonguyenvan/plantsclassify)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ ## πŸ› οΈ Tech Stack
69
 
70
+ - **Framework**: FastAPI + Uvicorn
71
+ - **LLM**: MegaLLM (Vietnamese-optimized)
72
+ - **Embeddings**: Vietnamese_Embedding
73
+ - **Vector DB**: Supabase pgvector
74
 
75
  ---
76
 
77
+ Built with ❀️ for Vietnamese medicinal plant enthusiasts