D R T o1 7 B G G U F

QuantFactory

Introduction

DRT-o1 is a project aimed at integrating long thought reasoning into neural machine translation (MT). This model emphasizes translating English sentences with complex structures, such as similes and metaphors, using a multi-agent framework. The approach includes three agents—a translator, an advisor, and an evaluator—to synthesize MT samples. The project uses backbones like Llama-3.1-8B-Instruct and Qwen2.5-7B-Instruct for training models such as DRT-o1-7B, DRT-o1-8B, and DRT-o1-14B.

Architecture

The DRT-o1 models are built on various instruct backbones, including Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Qwen2.5-14B-Instruct. The architecture leverages the capacity of these models to incorporate deep reasoning techniques into MT tasks, specifically targeting complex English sentences.

Training

DRT-o1 models were trained using a custom framework with three agents to synthesize translation samples, resulting in a dataset of 22,264 samples. The training utilized different backbone models to explore the potential of long thought reasoning in MT, without aiming to outperform existing solutions like OpenAI's O1.

Guide: Running Locally

  1. Install Dependencies: Use transformers from Hugging Face to load the model and tokenizer.

    pip install transformers
    
  2. Load the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "Krystalan/DRT-o1-7B"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Setup Inference: Prepare and tokenize the input prompt, then generate translations.

    prompt = "Please translate the following text from English to Chinese:\n[Your English text here]"
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    
    generated_ids = model.generate(**model_inputs, max_new_tokens=2048)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)
    
  4. Recommend Cloud GPUs: Consider using cloud services like AWS EC2 or Google Cloud Platform for accessing GPUs, especially for large models to handle computational demands efficiently.

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (cc-by-nc-sa-4.0).

More Related APIs in Text Generation