Skip to content

LLMs on a Companion Computer

Ollama

Ollama is a framework that simplifies running large language models (LLMs) locally on your computer, allowing for privacy and control over your data and interactions with the models.

Download ollama and see the quickstart docs for basic commands

Gemma 3

See the Gemma3 models available with Ollama.

Gemma 3 Model Performance on Raspberry Pi Compute Module 4

Hardware: Raspberry Pi Compute Module 4 (8GB RAM, 32GB eMMC Storage) Tool: Ollama

Model Prompt Total Duration Load Duration Prompt Eval Count Prompt Eval Duration Prompt Eval Rate Eval Count Eval Duration Eval Rate
gemma3:1b-it-qat "Why is the sky blue?" 2m 25.26s 122.09ms 16 tokens 1.01s 15.82 tokens/s 452 tokens 2m 24.13s 3.14 tokens/s
"Write a python function to average all the values in an array." 3m 16.13s 116.17ms 504 tokens 35.89s 14.04 tokens/s 491 tokens 2m 38.26s 3.10 tokens/s
gemma3:4b-it-qat "Why is the sky blue?" 12m 12.76s 117.63ms 16 tokens 2.03s 7.87 tokens/s 618 tokens 12m 10.60s 0.85 tokens/s
"Write a python function to average all the values in an array." Incomplete* - - - - - - -
gemma3:1b "Why is the sky blue?" 2m 2.10s 172.88ms 15 tokens 1.43s 10.51 tokens/s 468 tokens 2m 0.50s 3.88 tokens/s
"Write a python function to average all the values in an array." 2m 31.24s 161.40ms 505 tokens 2.06s 245.14 tokens/s 567 tokens 2m 28.91s 3.81 tokens/s
gemma3:4b "Why is the sky blue?" 8m 26.39s 166.98ms 15 tokens 0.74s 20.16 tokens/s 651 tokens 8m 25.48s 1.29 tokens/s
"Write a python function to average all the values in an array." 10m 29.78s 175.75ms 688 tokens 8.34s 82.51 tokens/s 755 tokens 10m 21.12s 1.22 tokens/s

Test was stopped prematurely due to excessive processing time and overheating of the Raspberry Pi (70°C+).