Tutorial

Installation

# Basic installation with core functionality
pip install parseport

# Enable GPU support (requires NVIDIA GPU with CUDA 11.8)
pip install "parseport[gpu]"

# Install component bundles (optional)
pip install "parseport[detectors]" # All detection components
pip install "parseport[ocr]"       # All OCR components
pip install "parseport[all]"       # All components

pip install "parseport[all, gpu]" # All components + GPU support

# For development setup
pip install "parseport[dev]"       # All components + development tools

Note: The [gpu] bundle requires an NVIDIA GPU with CUDA 11.8 drivers installed. Installation will fail on systems without compatible GPU drivers.

Quick Start

Here’s a simple example of processing a PDF document:

from parseport.document.pdf_document import PDFDocument
from parseport.layout_parser.simple_layout_parser import SimpleLayoutParser
from parseport.tools.layout_detector.paddle_layout_detector import PaddleLayoutDetector
from parseport.tools.ocr.easyocr import EasyOCR

# Load a PDF document
doc = PDFDocument.from_path('path/do/document.pdf', include_visuals=True)

# Initialize layout detection
detector = PaddleLayoutDetector(device='gpu')
parser = SimpleLayoutParser(detector)

# Parse document layout
components = parser.parse_document(doc)

# Process components from a specific page
first_page_components = [block for block in components if block.page_num == 0]

# Visualize detected components
image = doc.render_image_with_component_blocks(1, first_page_components)

# extract texts from the components
vlm_generator = OpenAIVLMGenerator()
reader = VLMDocumentReader(vlm_generator)
components = await reader.extract_texts(doc, components, show_progress=True)

# extract
formatter = VLMFormatter(vlm_generator)
formatted_texts = await formatter.format_text(
    doc, components,
    "extra tables as JSON, and ignore all other content."
)

Modules

Parseport’s core modules work together to process documents:

Document - Handles document loading, page management, and rendering (PDFDocument)
Layout Parser - Detects and extracts document components using layout detection models (SimpleLayoutParser)
Reader - Extracts text from components using OCR or vision-language models (VLMDocumentReader)
Formatter - Converts extracted components into structured output formats (VLMFormatter)

These modules support multiple implementations for layout detection (PaddleOCR, YOLO), OCR (EasyOCR, PaddleOCR), and vision-language models.

Tools

Layout Detection
- PaddleLayoutDetector: Uses PaddleOCR for layout analysis
- YOLO-based detector (optional)
OCR Engines
- EasyOCR
- PaddleOCR
- RapidOCR
Visual Language Models
- OpenAI VLM Generator for advanced text extraction
- Support for custom VLM implementations

Custom Region Processing

from parseport.struct import BoundingBox

# Define region of interest
region_bbox = BoundingBox(x0=100, y0=100, x1=500, y1=400)

# Parse specific page region
region_components = layout_parser.parse_page(
    document=document,
    page_num=0,
    dpi=300
)

Implementing Custom Detectors

You can create custom layout detectors by extending the BaseLayoutDetector class:

from parseport.tools.layout_detector import BaseLayoutDetector
from parseport.struct import ComponentBlock, ComponentType, BoundingBox
from parseport.tools.registry import register_detector

@register_detector('my-custom-detector')
class MyCustomDetector(BaseLayoutDetector):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = None

    def _initialize(self):
        """Load models or initialize resources (called on first use)"""
        self.model = load_my_model()  # Your model loading logic

    def predict(self, img_array: np.ndarray, page_num: int, **kwargs) -> List[ComponentBlock]:
        """Run detection on the input image"""
        self._ensure_initialized()  # Always call this first

        # Your detection logic here
        detections = self.model.detect(img_array)

        # Convert detections to ComponentBlocks
        components = []
        for det in detections:
            components.append(ComponentBlock(
                type=det.type,  # or TABLE, FIGURE, IMAGE
                bbox=BoundingBox(
                    x0=det.x0, y0=det.y0,
                    x1=det.x1, y1=det.y1
                ),
                page_num=page_num,
                text=""
            ))
        return components

The detector can then be used like any built-in detector:

from parseport import create_detector

detector = create_detector('my-custom-detector')
parser = SimpleLayoutParser(detector)

Get Started

Examples

Installation

Quick Start

Modules

Tools

Custom Region Processing

Implementing Custom Detectors

Get Started

Examples

​Installation

​Quick Start

​Modules

​Tools

​Custom Region Processing

​Implementing Custom Detectors

Installation

Quick Start

Modules

Tools

Custom Region Processing

Implementing Custom Detectors