LLMs on a Companion Computer
Ollama
Ollama is a framework that simplifies running large language models (LLMs) locally on your computer, allowing for privacy and control over your data and interactions with the models.
Download ollama and see the quickstart docs for basic commands
Gemma 3
See the Gemma3 models available with Ollama.
Gemma 3 Model Performance on Raspberry Pi Compute Module 4
Hardware: Raspberry Pi Compute Module 4 (8GB RAM, 32GB eMMC Storage) Tool: Ollama
Model | Prompt | Total Duration | Load Duration | Prompt Eval Count | Prompt Eval Duration | Prompt Eval Rate | Eval Count | Eval Duration | Eval Rate |
---|---|---|---|---|---|---|---|---|---|
gemma3:1b-it-qat | "Why is the sky blue?" | 2m 25.26s | 122.09ms | 16 tokens | 1.01s | 15.82 tokens/s | 452 tokens | 2m 24.13s | 3.14 tokens/s |
"Write a python function to average all the values in an array." | 3m 16.13s | 116.17ms | 504 tokens | 35.89s | 14.04 tokens/s | 491 tokens | 2m 38.26s | 3.10 tokens/s | |
gemma3:4b-it-qat | "Why is the sky blue?" | 12m 12.76s | 117.63ms | 16 tokens | 2.03s | 7.87 tokens/s | 618 tokens | 12m 10.60s | 0.85 tokens/s |
"Write a python function to average all the values in an array." | Incomplete* | - | - | - | - | - | - | - | |
gemma3:1b | "Why is the sky blue?" | 2m 2.10s | 172.88ms | 15 tokens | 1.43s | 10.51 tokens/s | 468 tokens | 2m 0.50s | 3.88 tokens/s |
"Write a python function to average all the values in an array." | 2m 31.24s | 161.40ms | 505 tokens | 2.06s | 245.14 tokens/s | 567 tokens | 2m 28.91s | 3.81 tokens/s | |
gemma3:4b | "Why is the sky blue?" | 8m 26.39s | 166.98ms | 15 tokens | 0.74s | 20.16 tokens/s | 651 tokens | 8m 25.48s | 1.29 tokens/s |
"Write a python function to average all the values in an array." | 10m 29.78s | 175.75ms | 688 tokens | 8.34s | 82.51 tokens/s | 755 tokens | 10m 21.12s | 1.22 tokens/s |
Test was stopped prematurely due to excessive processing time and overheating of the Raspberry Pi (70°C+).