Since the release of ChatGPT, the world has become captivated by Generative AI.

At work, my team has been developing a Generative AI chatbot for business. I have documented the development process through a series of blog posts (links below).

However, when working individually and/or testing, it can be useful to have an option to run a large language model, locally (offline, no Internet connection required).

I have been testing two options, LM Studio and ollama. Although LM Studio offers a robust user interface and direct access to models from Hugging Face, I have settled on ollama (for now).

ollama is a very lightweight application (Mac and Linux, Windows coming soon) for running and managing large language models via the command line (e.g., Terminal). It is fast and very easy to use, supporting a wide range of popular models, discoverable via the ollama website.

When using a Mac, I recommend Apple Silicon (M1/M2/M3), with the following memory configuration.

  • 3b models generally require at least 8GB of RAM
  • 7b models generally require at least 16GB of RAM
  • 13b models generally require at least 32GB of RAM
  • 70b models generally require at least 64GB of RAM

I am fortunate enough to have a MacBook Pro with an M3 Max (16C CPU / 40C GPU) and 128GB of memory. However, I generally use 7b and 13b models, which deliver good performance and an “acceptable” resource impact (leaving enough memory for other tasks).

With a model installed locally, it is stored at the following location (macOS):

~/.ollama/models/blobs/


The models can be quite large. For exmaple, the 13b llama2 model requires 7.4GB of storage.

ollama has a wide range of models available. However, I recommend the following three.

  • llama2: An open source model released by Meta. Trained on 2 trillion tokens.
  • codellama: A model for generating and discussing code.
  • starcoder: A code generation model trained on 80+ programming languages.

To get started with ollama, simply download and install the application. You can then interact with the application and/or models using the following commands and options.

Commands

The following commands can be used to interact with ollama.

To start ollama, run the command:

ollama serve


To run a model, run the command:

ollama run <model>


To pass a prompt as an argument, run the following command:

ollama run <model> "<prompt>"

To list all locally installed models, including their details, run the following command:

ollama list


To remove a local model, run the following command:

ollama rm


Options

The following options can be used when running a model.

To set session variables, run the option:

/set


To show model information, run the option:

/show


To exit/stop the model, run the option:

/bye


Conclusion

If you are interested in Generative AI, I would highly recommend ollama for testing, development and learning.

Be sure to keep a close eye on the ollama blog for updates and tutorials, as well as their Reddit community. I would also recommend reviewing the README documentation from Jeffrey Morgan on the ollama GitHub repository.