At work, my team has been developing a Generative AI chatbot for business. I have documented the development process through a series of blog posts (links below).
- OpenAI ChatGPT
- AI - Rise of the Machines
- Generative AI for Business
- Prompt Engineering
- Generative AI for Business - Update
- Generative AI - Embeddings
- Generative AI - Context
- Integrated Generative AI
- Generative AI - Enhancements
However, when working individually and/or testing, it can be useful to have an option to run a large language model, locally (offline, no Internet connection required).
ollama is a very lightweight application (Mac and Linux, Windows coming soon) for running and managing large language models via the command line (e.g., Terminal). It is fast and very easy to use, supporting a wide range of popular models, discoverable via the ollama website.
When using a Mac, I recommend Apple Silicon (M1/M2/M3), with the following memory configuration.
- 3b models generally require at least 8GB of RAM
- 7b models generally require at least 16GB of RAM
- 13b models generally require at least 32GB of RAM
- 70b models generally require at least 64GB of RAM
I am fortunate enough to have a MacBook Pro with an M3 Max (16C CPU / 40C GPU) and 128GB of memory. However, I generally use 7b and 13b models, which deliver good performance and an “acceptable” resource impact (leaving enough memory for other tasks).
With a model installed locally, it is stored at the following location (macOS):
The models can be quite large. For exmaple, the 13b llama2 model requires 7.4GB of storage.
ollama has a wide range of models available. However, I recommend the following three.
- llama2: An open source model released by Meta. Trained on 2 trillion tokens.
- codellama: A model for generating and discussing code.
- starcoder: A code generation model trained on 80+ programming languages.
To get started with ollama, simply download and install the application. You can then interact with the application and/or models using the following commands and options.
The following commands can be used to interact with ollama.
To start ollama, run the command:
To run a model, run the command:
ollama run <model>
To pass a prompt as an argument, run the following command:
ollama run <model> "<prompt>"
To list all locally installed models, including their details, run the following command:
To remove a local model, run the following command:
The following options can be used when running a model.
To set session variables, run the option:
To show model information, run the option:
To exit/stop the model, run the option:
If you are interested in Generative AI, I would highly recommend ollama for testing, development and learning.
Be sure to keep a close eye on the ollama blog for updates and tutorials, as well as their Reddit community. I would also recommend reviewing the README documentation from Jeffrey Morgan on the ollama GitHub repository.