Spaces:
Sleeping
title: LegalRagBackend
emoji: ⚖️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_file: main.py
Legal RAG Backend
A complete FastAPI-based backend for a Legal RAG (Retrieval Augmented Generation) AI system that provides intelligent legal verdict predictions with comprehensive explanations backed by constitutional provisions, IPC sections, case law, and statutes.
Overview
This system combines:
- LegalBERT fine-tuned model for verdict prediction
- FAISS vector search for legal document retrieval
- Sentence Transformers (BGE-Large) for semantic embeddings
- Google Gemini (optional) for generating detailed explanations
- HuggingFace Hub for model and dataset management
Project Structure
legal-rag-backend/
│
├── main.py # FastAPI application with REST endpoints
├── model_loader.py # LegalBERT model loading and inference
├── rag_loader.py # FAISS indices and chunk loading from HuggingFace
├── rag_service.py # Core service orchestrating prediction and RAG
├── prompt_builder.py # Constructs prompts for LLM with legal context
├── utils.py # Helper utilities for chunk processing
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── .gitignore # Git ignore patterns
├── README.md # This file
└── start.sh # Launch script
Features
API Endpoints
GET /health
Health check endpoint.
Response:
{
"status": "ok"
}
POST /predict
Get a quick verdict prediction with confidence score.
Request:
{
"text": "Case description and facts..."
}
Response:
{
"verdict": "guilty",
"confidence": 0.8734
}
POST /explain
Get comprehensive legal analysis with retrieved supporting documents.
Request:
{
"text": "Case description and facts..."
}
Response:
{
"verdict": "guilty",
"confidence": 0.8734,
"explanation": "Detailed legal analysis...",
"retrievedChunks": {
"constitution": [...],
"ipc": [...],
"ipcCase": [...],
"statute": [...],
"qa": [...],
"case": [...]
}
}
Installation & Setup
Local Development
Clone or create the project:
cd /home/neginegi/Desktop/rag/legal-rag-backendCreate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtConfigure environment (optional): Create a
.envfile for Gemini API integration:GEMINI_API_KEY=your_api_key_hereRun the server:
chmod +x start.sh ./start.shOr directly:
uvicorn main:app --host 0.0.0.0 --port 7860Access the API:
- API Documentation: http://localhost:7860/docs
- Health Check: http://localhost:7860/health
Docker Deployment
Build the image:
docker build -t legal-rag-backend .Run the container:
docker run -p 7860:7860 -e GEMINI_API_KEY=your_key legal-rag-backendAccess at: http://localhost:7860
How It Works
Model Loading: On startup, the system loads:
- LegalBERT model (
negi2725/LegalBertNew) - 6 FAISS indices (Constitution, IPC, Cases, Statutes, QA)
- Corresponding text chunks
- BGE-Large sentence transformer for embeddings
- LegalBERT model (
Prediction Flow:
- Input text is tokenized and passed through LegalBERT
- Softmax applied to get "guilty" or "not guilty" verdict
- Confidence score extracted from probabilities
RAG Retrieval:
- Query text embedded using BGE-Large
- Top-K similar chunks retrieved from each FAISS index
- Results organized by legal category
Explanation Generation:
- Structured prompt built with case facts, verdict, and retrieved context
- Optional Gemini API call for natural language explanation
- Fallback to prompt template if API not configured
Models & Datasets
- LegalBERT Model:
negi2725/LegalBertNew(HuggingFace) - RAG Dataset:
negi2725/dataRag(HuggingFace) - Embedding Model:
BAAI/bge-large-en-v1.5(Sentence Transformers)
Performance Notes
- All models and indices are preloaded at import time for fast inference
- Async endpoints ensure non-blocking I/O operations
- FAISS uses normalized L2 search for efficient similarity matching
- Typical response time: 1-3 seconds for
/explainendpoint
Requirements
- Python 3.10+
- 4GB+ RAM (8GB+ recommended for smooth operation)
- Internet connection for first-time model/dataset downloads
Troubleshooting
Models not downloading:
- Ensure internet connectivity
- Check HuggingFace Hub access
- Models cache in
~/.cache/huggingface/
Out of memory:
- Reduce batch size or top-K retrieval count
- Use CPU-only torch installation
- Consider using smaller embedding models
Gemini API errors:
- Verify API key in
.envfile - System works without Gemini (returns structured prompt)
- Check API quota and rate limits
Development
The codebase follows these conventions:
- CamelCase for variable names
- Minimal inline comments (self-documenting code)
- Async/await for all FastAPI endpoints
- Type hints for function signatures
License
This project is for educational and research purposes.
Support
For issues or questions, please refer to the HuggingFace model and dataset pages: