local llm tools · featured
Ollama
Run large language models locally with a single command. Ollama handles model downloads, quantization, and an OpenAI-compatible API — the fastest way to run Llama, Mistral, Gemma, and Phi locally.
## what it does
What it does
Ollama makes running local LLMs as simple as ollama run llama3. It handles model downloads (with content verification), GGUF quantization selection, GPU offloading, and serves a local OpenAI-compatible REST API on port 11434.
With 95,000+ GitHub stars and millions of downloads, it’s the dominant local LLM runtime.
Installation
# macOS / Linux — one-line installer:
curl -fsSL https://ollama.com/install.sh | sh
# macOS via Homebrew:
brew install ollama
# Windows: download installer from https://ollama.com/download
# Docker:
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
Running models
# Pull and run (interactive):
ollama run llama3.3
# Pull only:
ollama pull mistral
# List installed models:
ollama list
# Remove a model:
ollama rm llama3.3
OpenAI-compatible API
Ollama exposes an API compatible with OpenAI’s client libraries:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.3",
messages=[{"role": "user", "content": "Hello!"}]
)
This means you can swap Ollama in for OpenAI in any existing project by just changing the base URL.
Popular models
| Model | Size | Use case |
|---|---|---|
llama3.3 | 70B (Q4) | General purpose, best quality |
mistral | 7B | Fast, good for most tasks |
gemma3 | 4B / 12B | Google’s efficient model |
phi4 | 14B | Strong reasoning |
deepseek-r1 | 7B–671B | Reasoning / coding |
llava | 7B / 13B | Vision + text (multimodal) |
Hardware requirements
- Minimum: 4GB RAM — runs small quantized models (Phi 3 Mini, Gemma 2B)
- Comfortable: 16GB unified memory — most 7B–13B models run fast
- Fast: M3 Max / M4 Pro, NVIDIA RTX 3090+ — 70B models usable
GPU offloading is automatic — Ollama detects your GPU and offloads as many layers as VRAM allows.
## platforms
macoslinuxwindowsdocker
4GB RAM minimum (quantized). 8GB+ recommended for 7B models. M1/M2/M3 Mac or NVIDIA GPU for fast inference.
## embed this badge
cache ✓ in cache.directory
