Miner U
kitjesenIntroduction
MinerU is a model designed to convert PDF documents into Markdown format, supporting both Chinese and English languages. It performs tasks such as text layout analysis, mathematical formula recognition, and table structure reconstruction.
Architecture
MinerU employs a multi-model architecture, combining different models for specific tasks:
- Layout: Document layout analysis using Detectron2.
- MFD: Mathematical formula detection with a custom CNN.
- MFR: Mathematical formula recognition using a BERT-based model.
- TabRec: Table recognition and reconstruction with a T5-based approach.
Training
The model was trained using datasets of academic papers, textbooks, and technical documents. The training process involved:
- Pre-training of individual sub-models.
- Joint training for optimization.
- End-to-end fine-tuning.
Evaluation results indicate:
- Text recognition accuracy: 95%
- Formula recognition accuracy: 90%
- Table reconstruction accuracy: 85%
Guide: Running Locally
To run MinerU locally, follow these steps:
- Set up the environment with the required hardware and software:
- Hardware Requirements:
- RAM: 16GB+
- GPU: 8GB+ VRAM (Consider using cloud services like AWS or Google Cloud for GPU access)
- Storage: 5GB
- Software Requirements:
- Python >= 3.7
- PyTorch >= 1.9.0
transformers
library >= 4.28.0detectron2
- Hardware Requirements:
- Use the following Python code snippet to convert a PDF to Markdown:
from transformers import pipeline converter = pipeline("document-conversion", model="kitjesen/MinerU") markdown = converter("document.pdf")
License
MinerU is licensed under the Apache-2.0 License, allowing for both personal and commercial use.