Skip to content

Technology Selection

The DocuSnap-Backend system employs a series of modern technologies and frameworks to achieve efficient, secure, and scalable document and form processing capabilities. This page provides detailed information about the system's technology stack choices.

Backend Framework

Flask (3.1.1)

Flask is a lightweight Python Web framework used to build the API service for DocuSnap-Backend.

Key Features: - Lightweight, flexible, and easily extensible - Simple and intuitive API design - Rich plugin ecosystem - Suitable for rapid prototype development and small to medium-sized applications

Application in the System: - Provides RESTful API interfaces - Handles HTTP requests and responses - Manages routing and request dispatching - Handles errors and exceptions

Database

SQLite3

SQLite is a lightweight embedded relational database used to store task status and processing results.

Key Features: - Zero configuration, self-contained, serverless - Lightweight, efficient, reliable - Supports standard SQL queries - Suitable as an embedded application database

Application in the System: - Stores task status and processing results - Serves as a caching system to reduce repeated computations - Manages temporary data storage - Supports task status queries and result retrieval

AI/LLM Service

Zhipu AI (zhipuai 2.1.5)

Zhipu AI is a Chinese AI service provider that offers Large Language Model APIs for document analysis and information extraction.

Key Features: - Provides high-quality Chinese LLM services - Supports complex natural language understanding and generation - Offers structured output capabilities - Supports custom prompt engineering

Application in the System: - Analyzes document structure and content - Extracts form fields and values - Understands document context and semantics - Generates structured processing results

OCR Service

CnOCR

CnOCR is an OCR tool optimized specifically for Chinese, used to convert images to text.

Key Features: - OCR engine optimized for Chinese - High accuracy text recognition - Supports multiple image formats - Open-source and customizable

Application in the System: - Converts document and form images to text - Recognizes form structure and fields - Supports batch image processing - Operates as an independent service interacting with the backend server

Security Encryption

RSA and AES Encryption

The system uses a hybrid encryption scheme with RSA and AES to secure data transmission.

Key Features: - RSA asymmetric encryption for key exchange - AES symmetric encryption for data encryption - SHA256 hash for data integrity verification - End-to-end encryption to protect sensitive data

Application in the System: - Secures API requests and responses - Encrypts sensitive document and form data - Verifies the integrity and authenticity of requests - Prevents data leakage and tampering

Concurrent Processing

Python Standard Library

The system uses concurrent processing tools from the Python standard library, such as threading and concurrent.futures.

Key Features: - Queue for thread-safe task queues - threading for creating and managing threads - ThreadPoolExecutor for parallel task processing - Semaphore for controlling concurrent access

Application in the System: - Implements producer-consumer pattern for task processing - Processes multiple images and documents in parallel - Controls concurrent requests to external services - Improves overall system processing efficiency

Development and Testing Tools

Other Tools and Libraries

The system also uses various auxiliary tools and libraries for development, testing, and deployment.

Main Components: - json for data serialization and deserialization - base64 for binary data encoding - logging for log recording and management - time and datetime for time management and calculations

These technical components collectively form the technology stack of DocuSnap-Backend, supporting the system's core functionality and performance requirements. Each component has been carefully selected to ensure system stability, security, and scalability.