Tinyllama 2 1b miniguanaco G G U F

TheBloke

Introduction

The Tinyllama 2 1B MiniGuanaco model, created by Odunusi Abraham Ayoola, is available in GGUF format, a new model format introduced by the llama.cpp team. This format replaces the previously used GGML and offers compatibility with various clients and libraries.

Architecture

GGUF is supported by a variety of systems, including llama.cpp, text-generation-webui, KoboldCpp, LM Studio, LoLLMS Web UI, Faraday.dev, ctransformers, llama-cpp-python, and candle. This compatibility allows the Tinyllama model to be used across different platforms with GPU acceleration capabilities.

Training

The Tinyllama 2 1B MiniGuanaco model has been quantized using several methods to optimize performance and memory usage. The quantization options range from 2-bit to 8-bit, allowing users to select the best balance between model size and quality. These quantization techniques include GGML_TYPE_Q2_K, GGML_TYPE_Q3_K, GGML_TYPE_Q4_K, GGML_TYPE_Q5_K, and GGML_TYPE_Q6_K.

Guide: Running Locally

Basic Steps

  1. Download the model: Use the huggingface-hub Python library to download the specific model file you need. For example:

    pip3 install huggingface-hub
    huggingface-cli download TheBloke/Tinyllama-2-1b-miniguanaco-GGUF tinyllama-2-1b-miniguanaco.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
    
  2. Run using llama.cpp: Ensure you have the latest version of llama.cpp (commit d0cee0d or later), and execute the model with:

    ./main -ngl 32 -m tinyllama-2-1b-miniguanaco.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Human: {prompt}\n### Assistant:"
    
  3. Python integration: Use the ctransformers library to execute the model in Python:

    from ctransformers import AutoModelForCausalLM
    
    llm = AutoModelForCausalLM.from_pretrained("TheBloke/Tinyllama-2-1b-miniguanaco-GGUF", model_file="tinyllama-2-1b-miniguanaco.Q4_K_M.gguf", model_type="llama", gpu_layers=50)
    print(llm("AI is going to"))
    

Cloud GPUs

For enhanced performance, consider using cloud GPU services that support CUDA, AMD ROCm, or Metal acceleration, depending on your system.

License

The Tinyllama 2 1B MiniGuanaco model is available under an unspecified license. Users should verify and comply with any applicable terms and conditions.

More Related APIs