Documentation Index
Fetch the complete documentation index at: https://parserport.outerport.com/llms.txt
Use this file to discover all available pages before exploring further.
Installation
# Basic installation with core functionality
pip install parseport
# Enable GPU support (requires NVIDIA GPU with CUDA 11.8)
pip install "parseport[gpu]"
# Install component bundles (optional)
pip install "parseport[detectors]" # All detection components
pip install "parseport[ocr]" # All OCR components
pip install "parseport[all]" # All components
pip install "parseport[all, gpu]" # All components + GPU support
# For development setup
pip install "parseport[dev]" # All components + development tools
Note: The [gpu] bundle requires an NVIDIA GPU with CUDA 11.8 drivers installed. Installation will fail on systems without compatible GPU drivers.
Quick Start
Here’s a simple example of processing a PDF document:
from parseport.document.pdf_document import PDFDocument
from parseport.layout_parser.simple_layout_parser import SimpleLayoutParser
from parseport.tools.layout_detector.paddle_layout_detector import PaddleLayoutDetector
from parseport.tools.ocr.easyocr import EasyOCR
# Load a PDF document
doc = PDFDocument.from_path('path/do/document.pdf', include_visuals=True)
# Initialize layout detection
detector = PaddleLayoutDetector(device='gpu')
parser = SimpleLayoutParser(detector)
# Parse document layout
components = parser.parse_document(doc)
# Process components from a specific page
first_page_components = [block for block in components if block.page_num == 0]
# Visualize detected components
image = doc.render_image_with_component_blocks(1, first_page_components)
# extract texts from the components
vlm_generator = OpenAIVLMGenerator()
reader = VLMDocumentReader(vlm_generator)
components = await reader.extract_texts(doc, components, show_progress=True)
# extract
formatter = VLMFormatter(vlm_generator)
formatted_texts = await formatter.format_text(
doc, components,
"extra tables as JSON, and ignore all other content."
)
Modules
Parseport’s core modules work together to process documents:
- Document - Handles document loading, page management, and rendering (
PDFDocument)
- Layout Parser - Detects and extracts document components using layout detection models (
SimpleLayoutParser)
- Reader - Extracts text from components using OCR or vision-language models (
VLMDocumentReader)
- Formatter - Converts extracted components into structured output formats (
VLMFormatter)
These modules support multiple implementations for layout detection (PaddleOCR, YOLO), OCR (EasyOCR, PaddleOCR), and vision-language models.
-
Layout Detection
- PaddleLayoutDetector: Uses PaddleOCR for layout analysis
- YOLO-based detector (optional)
-
OCR Engines
- EasyOCR
- PaddleOCR
- RapidOCR
-
Visual Language Models
- OpenAI VLM Generator for advanced text extraction
- Support for custom VLM implementations
Custom Region Processing
from parseport.struct import BoundingBox
# Define region of interest
region_bbox = BoundingBox(x0=100, y0=100, x1=500, y1=400)
# Parse specific page region
region_components = layout_parser.parse_page(
document=document,
page_num=0,
dpi=300
)
Implementing Custom Detectors
You can create custom layout detectors by extending the BaseLayoutDetector class:
from parseport.tools.layout_detector import BaseLayoutDetector
from parseport.struct import ComponentBlock, ComponentType, BoundingBox
from parseport.tools.registry import register_detector
@register_detector('my-custom-detector')
class MyCustomDetector(BaseLayoutDetector):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = None
def _initialize(self):
"""Load models or initialize resources (called on first use)"""
self.model = load_my_model() # Your model loading logic
def predict(self, img_array: np.ndarray, page_num: int, **kwargs) -> List[ComponentBlock]:
"""Run detection on the input image"""
self._ensure_initialized() # Always call this first
# Your detection logic here
detections = self.model.detect(img_array)
# Convert detections to ComponentBlocks
components = []
for det in detections:
components.append(ComponentBlock(
type=det.type, # or TABLE, FIGURE, IMAGE
bbox=BoundingBox(
x0=det.x0, y0=det.y0,
x1=det.x1, y1=det.y1
),
page_num=page_num,
text=""
))
return components
The detector can then be used like any built-in detector:
from parseport import create_detector
detector = create_detector('my-custom-detector')
parser = SimpleLayoutParser(detector)