Neverending Story Q8_0 G G U F

Aleteian

Introduction

The NeverendingStory-Q8_0-GGUF model is a GGUF format conversion of the original Aleteian/NeverendingStory model. This conversion was accomplished using llama.cpp via the ggml.ai's GGUF-my-repo space on Hugging Face. For additional details on the original model, refer to its model card.

Architecture

The model leverages the transformers library and includes tags such as mergekit, merge, llama-cpp, and gguf-my-repo. It utilizes llama.cpp for model inference, supporting both CLI and server-based operations.

Training

The specific details regarding the training process of the model are not provided in this documentation. Users are encouraged to refer to the original model card for comprehensive information on the training methodology.

Guide: Running Locally

Installing llama.cpp

  1. Install llama.cpp using Homebrew (compatible with Mac and Linux):
    brew install llama.cpp
    

Running the CLI

  • Execute the CLI with the following command:
    llama-cli --hf-repo Aleteian/NeverendingStory-Q8_0-GGUF --hf-file neverendingstory-q8_0.gguf -p "The meaning to life and the universe is"
    

Running the Server

  • Start the server with:
    llama-server --hf-repo Aleteian/NeverendingStory-Q8_0-GGUF --hf-file neverendingstory-q8_0.gguf -c 2048
    

Additional Steps

  1. Clone the llama.cpp repository from GitHub:

    git clone https://github.com/ggerganov/llama.cpp
    
  2. Navigate into the llama.cpp directory and build it using the LLAMA_CURL=1 flag, along with hardware-specific flags if necessary (e.g., LLAMA_CUDA=1 for Nvidia GPUs on Linux):

    cd llama.cpp && LLAMA_CURL=1 make
    
  3. Run inference using the compiled binaries:

    ./llama-cli --hf-repo Aleteian/NeverendingStory-Q8_0-GGUF --hf-file neverendingstory-q8_0.gguf -p "The meaning to life and the universe is"
    

    or

    ./llama-server --hf-repo Aleteian/NeverendingStory-Q8_0-GGUF --hf-file neverendingstory-q8_0.gguf -c 2048
    

Cloud GPUs

For enhanced performance, consider using cloud GPUs such as AWS, Google Cloud, or Azure to run llama.cpp.

License

The licensing details for the NeverendingStory-Q8_0-GGUF model are not explicitly provided in this summary. Users should review the original model card and associated repositories for any licensing information.

More Related APIs