nougat base LLM Model — Open LLM List

Introduction

The Nougat model, specifically the base-sized version, is developed to transcribe scientific PDFs into Markdown format. It was introduced in the paper "Nougat: Neural Optical Understanding for Academic Documents" by Blecher et al. This model is released by Facebook AI and managed by Hugging Face.

Architecture

Nougat uses a Donut model architecture, featuring a Swin Transformer as the vision encoder and an mBART model as the text decoder. The model is designed to predict Markdown from the pixels of PDF images.

Training

The model is trained to autoregressively convert PDF images into Markdown text. This involves using the Swin Transformer to handle the visual input and the mBART decoder to generate the corresponding text.

Guide: Running Locally

Prerequisites: Ensure you have Python and PyTorch installed on your system.
Install Hugging Face Transformers: Use pip to install the library:
```
pip install transformers
```
Download the Model: Access the model via the Hugging Face model hub.
Run the Model: Load the model using the Transformers library and apply it to transcribe PDFs to Markdown.
Recommended Hardware: For optimal performance, use a cloud GPU service like AWS, Google Cloud, or Azure.

License

The Nougat model is released under the CC BY-NC 4.0 license, which allows for non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

More Related APIs in Image To Text