
LayoutParser, DocumentReader and Formatter that can be used to build custom pipelines. It comes with a few off-the-shelf models for layout detection and OCR, and also encourages bring in custom finetuned models to support special needs.
Check out the Tutorial to get started.
Also take a look at:
Overview for the overview and motivation behind this library.
Document Ingestion for an in-depth dive into document ingestion.
Open Source Tools for a list of open source tools that are helpful for building custom document ingestion pipeline.