Using Ollama to Run LLMs Locally

Tired of cloud-based AI services that compromise your privacy and rack up subscription costs? Discover how to run powerful language models directly on your own computer with Ollama. This comprehensive guide will show you how to unlock local AI capabilities, giving you complete control over your data and interactions—no internet connection required. By Eric Van de Kerckhove.

Leave a rating/review
Save for later
Share
You are currently viewing page 2 of 2 of this article. Click here to view the first page.

Custom Models and GGUF Files

If you need models not available in the official library (yet), you can use models from the popular model repository Hugging Face. Not that these models won’t work out of the box, but once you download the model in Safetensors or GGUF format, you can run it with Ollama by creating a Modelfile for it.

I won’t go into detail here, but you can find instructions on creating custom models on Ollama’s model import documentation page. In a nutshell, here are the steps you need to take:

  • Download a model file in either Safetensors or GGUF format
  • Create a new file named Modelfile (no extension) with the following content: FROM /path/to/your/model.safetensors or FROM /path/to/your/model.gguf
  • Run the following command from the directory where you created the Modelfile: ollama create name-of-your-model
  • Run the model: ollama run name-of-your-model

For more detailed information on creating the Modelfile and its syntax, refer to the Modelfile documentation.

Using a Web UI

While the command line interface is functional, you might prefer a more visual and user-friendly interface similar to ChatGPT. This is where Open WebUI comes in.
Open WebUI is a web-based chat interface for Ollama and other LLM runners.

To install Open WebUI, make sure you have Docker installed, then navigate to the Open WebUI GitHub repository. Scroll down to the How to Install section and follow the instructions for Docker.

For example, use the command below if you have an Nvidia GPU:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

This will download and install Open WebUI on your local machine in a Docker container.

OpenWEbUI install via Docker

Once installed, navigate to http://localhost:3000 in your web browser to access Open WebUI. The first time you do this, you’ll get an intro screen with a Get started button.

Get started

Click on the button to get a form for creating your Admin account. Fill in a name, email address and a password to start using Open WebUI.

Open WebUI account creation

Once you’ve done that, you can chat with any of the models you’ve installed on your local machine as if you were talking to ChatGPT, Claude or Gemini!
For example, I ask deepseek-r1 here to write me a Python script to count the number of words in a text file.

Open WebUI chat

First, it needs to think for a while.

Thinking

After that, I get the response, just like ChatGPT would.

Solution

Pretty cool, right? You can use Open WebUI to experiment with LLMs locally in an interactive and user-friendly way.

Using the Ollama API in Your Code

As a final cherry on top, I’ll share with you how you can use Ollama’s API to integrate LLMs into your own applications. For this example, I’ll be using Python.

Note: Not familiar yet with Python? I have some great news, we have a Python for AI crash course you can check out!

To get started, create a new Python script and import the necessary libraries:

import requests
import json

Make sure you have the requests library installed. You can install it with the command below:

pip install requests

Next, add some variables to your script to store your API endpoint and the model you want to use:

OLLAMA_API = "http://localhost:11434/api/generate" # Endpoint
LLM_NAME = "deepseek-r1:8b" # Model

Then, create a function to send prompts to the model:

# Send prompt to LLM
def ask_llm(prompt, model):
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }

    response = requests.post(OLLAMA_API, json=data)

    if response.status_code == 200:
        return response.json()["response"]
    else:
        return f"Error: {response.status_code}"

This function takes a prompt and a model name as input and sends it to the Ollama API to generate a response. If the response is successful, it returns the generated text. If not, it returns an error message.

Now add the main function to your script:

if __name__ == "__main__":
    question = "What is the capital of Belgium?"
    model = LLM_NAME

    print(f"Asking: {question}")
    answer = ask_llm(question, model)
    print("\nResponse:")
    print(answer)

This calls the function you just added to send the prompt to the model and print the response.
Finally, run the script. You should get an output like this:

Asking: What is the capital of Belgium?

Response:
<think>

</think>

The capital of Belgium is Brussels.

You can extend this to build more complex applications, like chatbots, content generators, or AI assistants.
For even more inspiration, check out our Text Generation with OpenAI module, it covers how to use the OpenAI API with Python, but a lot of the concepts are applicable to Ollama as well.

Where to Go From Here?

Congratulations on setting up Ollama and running LLMs locally! You now have a powerful AI system running on your own hardware, with complete control over your data and interactions. As you continue to explore and work with local LLMs, here are some resources to help you go further:

Remember that running LLMs locally is a trade-off between convenience and performance. While cloud-based models might offer more capabilities, local models provide privacy, no ongoing costs, and the ability to work offline.

I hope this tutorial has helped you get started with running LLMs locally using Ollama. Feel free to share your experiences and questions in the comments section below!

AI

Contributors

Over 300 content creators. Join our team.