ollama

Since the release of ChatGPT, the world has become captivated by Generative AI.

At work, my team has been developing a Generative AI chatbot for business. I have documented the development process through a series of blog posts (links below).

However, when working individually and/or testing, it can be useful to have an option to run a large language model, locally (offline, no Internet connection required).

I have been testing two options, LM Studio and ollama. Although LM Studio offers a robust user interface and direct access to models from Hugging Face, I have settled on ollama (for now).

ollama is a very lightweight application (Mac and Linux, Windows coming soon) for running and managing large language models via the command line (e.g., Terminal). It is fast and very easy to use, supporting a wide range of popular models, discoverable via the ollama website.

When using a Mac, I recommend Apple Silicon (M1/M2/M3), with the following memory configuration.

3b models generally require at least 8GB of RAM
7b models generally require at least 16GB of RAM
13b models generally require at least 32GB of RAM
70b models generally require at least 64GB of RAM

I am fortunate enough to have a MacBook Pro with an M3 Max (16C CPU / 40C GPU) and 128GB of memory. However, I generally use 7b and 13b models, which deliver good performance and an “acceptable” resource impact (leaving enough memory for other tasks).

With a model installed locally, it is stored at the following location (macOS):

~/.ollama/models/blobs/

The models can be quite large. For exmaple, the 13b llama2 model requires 7.4GB of storage.

ollama has a wide range of models available. However, I recommend the following three.

llama2: An open source model released by Meta. Trained on 2 trillion tokens.
codellama: A model for generating and discussing code.
starcoder: A code generation model trained on 80+ programming languages.

To get started with ollama, simply download and install the application. You can then interact with the application and/or models using the following commands and options.

Commands

The following commands can be used to interact with ollama.

To start ollama, run the command:

ollama serve

To run a model, run the command:

ollama run <model>

To pass a prompt as an argument, run the following command:

ollama run <model> "<prompt>"

To list all locally installed models, including their details, run the following command:

ollama list

To remove a local model, run the following command:

ollama rm

Options

The following options can be used when running a model.

To set session variables, run the option:

/set

To show model information, run the option:

/show

To exit/stop the model, run the option:

/bye

Conclusion

If you are interested in Generative AI, I would highly recommend ollama for testing, development and learning.

Be sure to keep a close eye on the ollama blog for updates and tutorials, as well as their Reddit community. I would also recommend reviewing the README documentation from Jeffrey Morgan on the ollama GitHub repository.

artificial_intelligence ai machine_learning ml generative_ai large_language_model llm natural_language_processing nlp transformer data prompt_engineering ollama llama llama2 meta codellama starcoder lm_studio hugging_face

LifeinTECH

ollama

Commands

Options

Conclusion

Matthew Bull

Comments