Quick Start Guide¶

This page provides a quick start guide for the DocuSnap-Backend system, helping developers and operations personnel quickly deploy, configure, and use the system.

System Requirements¶

Before you begin, please ensure your environment meets the following requirements:

Minimum Requirements¶

Operating System: Linux (Ubuntu 18.04+/CentOS 7+) or macOS 10.15+
Python: Python 3.8 or higher
Storage: At least 1GB of available space
Memory: At least 4GB RAM
Network: Internet connection (for installing dependencies and calling LLM API)

Recommended Configuration¶

Operating System: Ubuntu 20.04 LTS or higher
Python: Python 3.9 or higher
Storage: 10GB+ SSD storage
Memory: 8GB+ RAM
CPU: 4+ cores
Network: High-speed internet connection

Quick Installation¶

1. Clone the Repository¶

First, clone the DocuSnap-Backend repository to your local machine:

git clone https://github.com/JI-DeepSleep/DocuSnap-Backend.git
cd DocuSnap-Backend

2. Create a Virtual Environment¶

Create and activate a Python virtual environment:

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Linux/macOS
source venv/bin/activate
# Windows
venv\Scripts\activate

3. Install Dependencies¶

Install the required Python dependencies:

pip install -r requirements.txt

4. Generate Key Pairs¶

Use the provided script to generate RSA key pairs:

# Add execution permission to the script
chmod +x genKeyPairs.sh

# Run the script to generate key pairs
./genKeyPairs.sh

This will generate private_key.pem and public_key.pem files in the current directory.

5. Configure the System¶

Create a configuration file:

# Copy the sample configuration file
cp priv_sets.py.sample priv_sets.py

# Edit the configuration file
nano priv_sets.py  # or use your preferred editor

In the configuration file, you need to set at least the following parameters:

Zhipu AI API Key: For calling the LLM service
OCR Service URL: Address pointing to the OCR service
Key File Paths: Paths to the RSA private and public keys

6. Start the OCR Service¶

DocuSnap-Backend requires an external OCR service. You can use CnOCR or another compatible OCR service.

If you're using CnOCR, you can start the service with the following steps:

# Install CnOCR
pip install cnocr

# Create a simple OCR service (example)
cat > ocr_service.py << 'EOF'
from flask import Flask, request, jsonify
from cnocr import CnOcr
import io
from PIL import Image

app = Flask(__name__)
ocr = CnOcr()

@app.route('/process', methods=['POST'])
def process():
    if 'image' not in request.files:
        return jsonify({"error": "No image provided"}), 400

    image_file = request.files['image']
    img = Image.open(io.BytesIO(image_file.read()))
    result = ocr.ocr(img)

    text = ''.join([''.join(line) for line in result])
    return jsonify({"text": text})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5001)
EOF

# Start the OCR service
python ocr_service.py &

7. Start the Application¶

Start the DocuSnap-Backend application:

# Development environment
flask run --host=0.0.0.0 --port=5000

# Production environment
gunicorn --workers=4 --bind=0.0.0.0:8000 app:app

Now, DocuSnap-Backend should be running and listening on the specified port.

Basic Usage¶

1. API Endpoints¶

DocuSnap-Backend provides the following main API endpoints:

/api/process_document: Process document images
/api/process_form: Process form images
/api/process_form_filling: Process form auto-filling
/api/task_status: Query task status and results

All API requests and responses need to use end-to-end encryption.

2. Client Example¶

Below is an example of using Python to call the API:

import requests
import json
import base64
import hashlib
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP, AES
from Crypto.Random import get_random_bytes
from Crypto.Util.Padding import pad, unpad

# Load public key
with open('public_key.pem', 'rb') as f:
    public_key = RSA.import_key(f.read())

# Generate AES key
aes_key = get_random_bytes(16)

# Encrypt AES key with RSA public key
cipher_rsa = PKCS1_OAEP.new(public_key)
encrypted_aes_key = base64.b64encode(cipher_rsa.encrypt(aes_key)).decode('utf-8')

# Prepare request data
data = {
    "images": [
        # Base64 encoded image data
        "base64_image_data_here"
    ]
}

# Convert data to JSON string
data_json = json.dumps(data)

# Calculate SHA256 hash of data as signature
signature = hashlib.sha256(data_json.encode('utf-8')).hexdigest()

# Encrypt data using AES
iv = get_random_bytes(16)
cipher_aes = AES.new(aes_key, AES.MODE_CBC, iv)
padded_data = pad(data_json.encode('utf-8'), AES.block_size)
encrypted_data = base64.b64encode(iv + cipher_aes.encrypt(padded_data)).decode('utf-8')

# Build request
request_data = {
    "encrypted_data": encrypted_data,
    "encrypted_key": encrypted_aes_key,
    "signature": signature
}

# Send request
response = requests.post(
    "http://localhost:8000/api/process_document",
    json=request_data
)

# Parse response
if response.status_code == 202:
    # Asynchronous task created successfully
    response_data = response.json()

    # Decrypt response
    encrypted_response = response_data["encrypted_data"]
    encrypted_response_bytes = base64.b64decode(encrypted_response)
    iv = encrypted_response_bytes[:16]
    ciphertext = encrypted_response_bytes[16:]

    cipher_aes = AES.new(aes_key, AES.MODE_CBC, iv)
    decrypted_data = unpad(cipher_aes.decrypt(ciphertext), AES.block_size)
    result = json.loads(decrypted_data)

    # Get task ID
    task_id = result["task_id"]
    print(f"Task created with ID: {task_id}")

    # Query task status
    while True:
        # Build status query request
        status_data = {
            "task_id": task_id
        }
        status_data_json = json.dumps(status_data)
        status_signature = hashlib.sha256(status_data_json.encode('utf-8')).hexdigest()

        iv = get_random_bytes(16)
        cipher_aes = AES.new(aes_key, AES.MODE_CBC, iv)
        padded_status_data = pad(status_data_json.encode('utf-8'), AES.block_size)
        encrypted_status_data = base64.b64encode(iv + cipher_aes.encrypt(padded_status_data)).decode('utf-8')

        status_request = {
            "encrypted_data": encrypted_status_data,
            "encrypted_key": encrypted_aes_key,
            "signature": status_signature
        }

        status_response = requests.post(
            "http://localhost:8000/api/task_status",
            json=status_request
        )

        if status_response.status_code == 200:
            # Decrypt status response
            status_response_data = status_response.json()
            encrypted_status = status_response_data["encrypted_data"]
            encrypted_status_bytes = base64.b64decode(encrypted_status)
            iv = encrypted_status_bytes[:16]
            ciphertext = encrypted_status_bytes[16:]

            cipher_aes = AES.new(aes_key, AES.MODE_CBC, iv)
            decrypted_status = unpad(cipher_aes.decrypt(ciphertext), AES.block_size)
            status_result = json.loads(decrypted_status)

            if status_result["status"] == "completed":
                print("Task completed!")
                print(f"Result: {status_result['result']}")
                break
            elif status_result["status"] == "error":
                print(f"Task failed: {status_result.get('error', 'Unknown error')}")
                break
            else:
                print(f"Task status: {status_result['status']}")
                import time
                time.sleep(2)  # Wait 2 seconds before querying again
        else:
            print(f"Failed to get task status: {status_response.status_code}")
            break
else:
    print(f"Request failed: {response.status_code}")
    print(response.text)

3. Web Interface¶

DocuSnap-Backend also provides a simple Web interface for testing and demonstration:

Access http://localhost:8000/ocr in your browser
Upload image files
Select processing type (document, form, or form filling)
Click the submit button
View processing results

Common Issues¶

1. OCR Service Connection Failure¶

Issue: The application cannot connect to the OCR service.

Solution: - Confirm the OCR service is running - Check if the OCR service URL configuration is correct - Verify network connections and firewall settings

2. LLM API Call Failure¶

Issue: Calling the Zhipu AI API fails.

Solution: - Check if the API key is correct - Verify network connection - Confirm API usage quota has not been exceeded

3. End-to-End Encryption Issues¶

Issue: Encryption or decryption operations fail.

Solution: - Confirm key file paths are configured correctly - Verify key file permission settings - Check encryption and decryption code implementation

4. Performance Issues¶

Issue: System responds slowly or processing times out.

Solution: - Increase the number of worker threads - Optimize OCR and LLM service concurrency settings - Consider upgrading hardware resources or using higher-performance servers

Next Steps¶

After completing the basic deployment and configuration, you can consider the following next steps:

Configure Nginx Reverse Proxy: Improve security and performance
Set up HTTPS: Obtain SSL certificates using Let's Encrypt
Implement Monitoring: Monitor system status using Prometheus and Grafana
Configure Log Management: Centralize log management using the ELK stack
Implement High Availability Deployment: Configure multi-instance deployment and load balancing

For more detailed information, please refer to the Deployment Architecture and Scalability and Fault Tolerance documentation.