I tried running Meta’s Llama3 locally

2024.05.08

Introduction

On 18th April 2024, Meta announced their newest state-of-the-art openly available LLM Llama3. It is available in 8B and 70B parameter sizes. Since it is openly available I was able to run it on my local system and test it, here's how I did it.

Installing Ollama

Ollama is a cross platform tool which allows users to run LLMs locally. It takes away the complexity of finding, downloading and configuring LLMs and makes it much simpler to run them. It was synthesized as a tool to run models locally. It is available for Mac, Linux and Windows (preview only) environments and the desktop tool can be downloaded from their website here.

I am using an Apple MacBook Pro, hence I downloaded the MacOS version of the tool. Once the tool was installed in my system, it prompted me to install the CLI utility which goes by the same name.

Alternatively, you can choose to install the CLI utility directly by using homebrew.

brew install ollama

Verifying the installation

Once installed, you can check the installation by executing the command ollama.

An output like the one shown in the image below will appear.

Running Meta Llama3

With Ollama in place, there is no need to perform any additional configuration before running the Llama3 model.

First we need to start ollama, if you want to start Ollama without the desktop application then you need to execute the following command in the terminal.

ollama serve

This starts up ollama, alternatively you can just start the Ollama application.

Once Ollama is up and running you can run Llama 3 by executing the following command.

ollama run llama3

If the llama3 is not present on the system, this command downloads llama3 from the Ollama model library. After the download is completed, it starts the model's execution.

You can now interact with Llama 3 in a chat format. It is a pretrained model so you can even use the model offline!

My Experience with Llama 3 on Ollama

I have tried executing the model on 3 different types of machines:

  1. 2020 Intel i5 MacBook Pro with 16GB memory
  2. 2023 M2 MacBook Air with 16 GB memory
  3. 2023 M3 MacBook Pro with 16 GB memory

In my experience, the model's latency has been between 5 - 10 seconds on both the M2 and M3 Apple silicon MacBooks however, on the M2 silicon MacBook Air I noticed that when the LLM was generating a response, other applications would lag signigicantly.

On the 2020 Intel i5 MacBook Pro, the experience was quite different, the latency was between 20 - 30 seconds, with significant lagging/ no response from the system till the model generated the whole response.