D R T o1 14 B
KrystalanIntroduction
The DRT-o1 project explores the integration of long thought reasoning into neural machine translation (MT) through the development of models like DRT-o1-7B, DRT-o1-8B, and DRT-o1-14B. These models utilize mined English sentences featuring similes or metaphors and a multi-agent framework to synthesize MT samples. The goal is to advance research in MT by applying deep reasoning techniques.
Architecture
DRT-o1 models are built on backbones such as Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct, and Qwen2.5-14B-Instruct. The architecture involves a translator, an advisor, and an evaluator, which collaborate to generate translations through long thought reasoning. This approach is designed to enhance the depth and quality of translations.
Training
The training process involves synthesizing 22,264 samples and leveraging the strengths of different backbone models. This process aims to improve the model's ability to translate complex sentences by employing a chain-of-thought reasoning methodology.
Guide: Running Locally
To run DRT-o1 models locally, follow these steps:
- Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.
- Load the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Krystalan/DRT-o1-7B" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
- Prepare Input:
prompt = "Please translate the following text from English to Chinese:\n[Your text here]"
- Generate Output:
messages = [ {"role": "system", "content": "You are a philosopher skilled in deep thinking, accustomed to exploring complex problems with profound insight."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=2048) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)
For optimal performance, use cloud GPUs from providers like AWS, GCP, or Azure.
License
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (cc-by-nc-sa-4.0) license.