LegalRagBackend / README.md
negi2725's picture
Update README.md
c64e820 verified
metadata
title: LegalRagBackend
emoji: ⚖️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_file: main.py

Legal RAG Backend

A complete FastAPI-based backend for a Legal RAG (Retrieval Augmented Generation) AI system that provides intelligent legal verdict predictions with comprehensive explanations backed by constitutional provisions, IPC sections, case law, and statutes.

Overview

This system combines:

  • LegalBERT fine-tuned model for verdict prediction
  • FAISS vector search for legal document retrieval
  • Sentence Transformers (BGE-Large) for semantic embeddings
  • Google Gemini (optional) for generating detailed explanations
  • HuggingFace Hub for model and dataset management

Project Structure

legal-rag-backend/
│
├── main.py              # FastAPI application with REST endpoints
├── model_loader.py      # LegalBERT model loading and inference
├── rag_loader.py        # FAISS indices and chunk loading from HuggingFace
├── rag_service.py       # Core service orchestrating prediction and RAG
├── prompt_builder.py    # Constructs prompts for LLM with legal context
├── utils.py             # Helper utilities for chunk processing
├── requirements.txt     # Python dependencies
├── Dockerfile           # Container configuration
├── .gitignore          # Git ignore patterns
├── README.md           # This file
└── start.sh            # Launch script

Features

API Endpoints

GET /health

Health check endpoint.

Response:

{
  "status": "ok"
}

POST /predict

Get a quick verdict prediction with confidence score.

Request:

{
  "text": "Case description and facts..."
}

Response:

{
  "verdict": "guilty",
  "confidence": 0.8734
}

POST /explain

Get comprehensive legal analysis with retrieved supporting documents.

Request:

{
  "text": "Case description and facts..."
}

Response:

{
  "verdict": "guilty",
  "confidence": 0.8734,
  "explanation": "Detailed legal analysis...",
  "retrievedChunks": {
    "constitution": [...],
    "ipc": [...],
    "ipcCase": [...],
    "statute": [...],
    "qa": [...],
    "case": [...]
  }
}

Installation & Setup

Local Development

  1. Clone or create the project:

    cd /home/neginegi/Desktop/rag/legal-rag-backend
    
  2. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure environment (optional): Create a .env file for Gemini API integration:

    GEMINI_API_KEY=your_api_key_here
    
  5. Run the server:

    chmod +x start.sh
    ./start.sh
    

    Or directly:

    uvicorn main:app --host 0.0.0.0 --port 7860
    
  6. Access the API:

Docker Deployment

  1. Build the image:

    docker build -t legal-rag-backend .
    
  2. Run the container:

    docker run -p 7860:7860 -e GEMINI_API_KEY=your_key legal-rag-backend
    
  3. Access at: http://localhost:7860

How It Works

  1. Model Loading: On startup, the system loads:

    • LegalBERT model (negi2725/LegalBertNew)
    • 6 FAISS indices (Constitution, IPC, Cases, Statutes, QA)
    • Corresponding text chunks
    • BGE-Large sentence transformer for embeddings
  2. Prediction Flow:

    • Input text is tokenized and passed through LegalBERT
    • Softmax applied to get "guilty" or "not guilty" verdict
    • Confidence score extracted from probabilities
  3. RAG Retrieval:

    • Query text embedded using BGE-Large
    • Top-K similar chunks retrieved from each FAISS index
    • Results organized by legal category
  4. Explanation Generation:

    • Structured prompt built with case facts, verdict, and retrieved context
    • Optional Gemini API call for natural language explanation
    • Fallback to prompt template if API not configured

Models & Datasets

  • LegalBERT Model: negi2725/LegalBertNew (HuggingFace)
  • RAG Dataset: negi2725/dataRag (HuggingFace)
  • Embedding Model: BAAI/bge-large-en-v1.5 (Sentence Transformers)

Performance Notes

  • All models and indices are preloaded at import time for fast inference
  • Async endpoints ensure non-blocking I/O operations
  • FAISS uses normalized L2 search for efficient similarity matching
  • Typical response time: 1-3 seconds for /explain endpoint

Requirements

  • Python 3.10+
  • 4GB+ RAM (8GB+ recommended for smooth operation)
  • Internet connection for first-time model/dataset downloads

Troubleshooting

Models not downloading:

  • Ensure internet connectivity
  • Check HuggingFace Hub access
  • Models cache in ~/.cache/huggingface/

Out of memory:

  • Reduce batch size or top-K retrieval count
  • Use CPU-only torch installation
  • Consider using smaller embedding models

Gemini API errors:

  • Verify API key in .env file
  • System works without Gemini (returns structured prompt)
  • Check API quota and rate limits

Development

The codebase follows these conventions:

  • CamelCase for variable names
  • Minimal inline comments (self-documenting code)
  • Async/await for all FastAPI endpoints
  • Type hints for function signatures

License

This project is for educational and research purposes.

Support

For issues or questions, please refer to the HuggingFace model and dataset pages: