Running Ollama Locally with Remocode
Ollama lets you run AI models entirely on your own machine. No API keys, no cloud services, no data leaving your computer. This is the most private way to use AI coding assistance in Remocode.
Why Run Models Locally?
- ●Privacy — your code never leaves your machine
- ●No API costs — once installed, inference is free
- ●No rate limits — run as many queries as your hardware allows
- ●Offline capability — works without an internet connection
- ●Full control — choose exactly which model version to run
Prerequisites
- ●macOS 12 Monterey or newer
- ●At least 8 GB of RAM (16 GB recommended for larger models)
- ●10-50 GB of free disk space depending on model choices
- ●Remocode installed and signed in
Step 1: Install Ollama
Download and install Ollama from the official site:
# Option 1: Download from the website
# Visit https://ollama.com and download the macOS installer
# Option 2: Install via Homebrew
brew install ollamaAfter installation, verify it is working:
ollama --versionStep 2: Start the Ollama Server
Ollama runs as a background service. Start it with:
ollama serveIf you installed via the macOS app, it starts automatically. The server listens on http://localhost:11434 by default.
Step 3: Pull Models
Download the models you want to use. Remocode supports five Ollama models:
# Llama 3.2 — Meta's latest open model, strong all-rounder
ollama pull llama3.2
# Mistral — excellent at code and instruction following
ollama pull mistral
# Code Llama — specialized for code generation and understanding
ollama pull codellama
# Qwen 3.5 — strong multilingual and coding capabilities
ollama pull qwen3.5
# DeepSeek V3 — powerful open-source reasoning model
ollama pull deepseek-v3Each model download is between 4-30 GB depending on the model and quantization level. Ensure you have sufficient disk space.
Step 4: Verify Models Are Available
ollama listThis shows all downloaded models with their sizes and modification dates.
Step 5: Configure Remocode
- ●Press `⌘⇧A` to open the AI panel
- ●Click the ⚙ Settings gear icon
- ●Navigate to the Provider tab
- ●Select Ollama as the provider
- ●No API key is needed — Remocode connects to the local Ollama server automatically
- ●Choose your default model
- ●Click Save
Model Comparison for Local Use
| Model | Size | RAM Required | Strengths | |-------|------|-------------|-----------| | Llama 3.2 | ~5 GB | 8 GB | General-purpose, good at code | | Mistral | ~4 GB | 8 GB | Instruction following, concise | | Code Llama | ~4-13 GB | 8-16 GB | Code-specific training | | Qwen 3.5 | ~5 GB | 8 GB | Multilingual, strong reasoning | | DeepSeek V3 | ~8-20 GB | 16 GB | Deep reasoning, complex tasks |
Performance Tips
For the best experience:
- ●Close unnecessary applications to free up RAM for the model
- ●Use smaller models (Mistral, Llama 3.2) if you have 8 GB of RAM
- ●Use larger models (DeepSeek V3) only if you have 16 GB or more
- ●Keep Ollama running in the background — cold starts add latency to the first request
Optimizing response speed:
# Pre-load a model into memory for faster first responses
ollama run llama3.2 --keepalive 60mThis keeps the model loaded in memory for 60 minutes even when idle, eliminating cold start delays.
Troubleshooting
"Connection refused" error in Remocode:
- ●Ensure Ollama is running:
ollama serve - ●Check that port 11434 is not blocked by a firewall
- ●Verify with:
curl http://localhost:11434/api/tags
Slow responses:
- ●Your model may be too large for your available RAM
- ●Try a smaller model or close other applications
- ●Check CPU usage with Activity Monitor
Model not appearing in Remocode:
- ●Run
ollama listto confirm the model is downloaded - ●Restart Remocode after pulling new models
- ●Ensure the model names match what Remocode expects
Out of memory errors:
- ●Switch to a smaller model
- ●Close memory-intensive applications (browsers with many tabs, IDEs)
- ●Consider using quantized versions:
ollama pull llama3.2:7b-q4_0
Recommended Setup for Different Hardware
8 GB RAM Mac:
- ●Use Mistral or Llama 3.2 as your primary model
- ●Avoid running other heavy applications simultaneously
16 GB RAM Mac:
- ●Use DeepSeek V3 or Code Llama for complex tasks
- ●Llama 3.2 for everyday quick interactions
- ●Comfortable running alongside an IDE and browser
32 GB+ RAM Mac:
- ●Run any model without constraints
- ●Keep multiple models loaded for instant switching
Ollama with Remocode gives you a fully private, fully local AI coding assistant. The trade-off versus cloud providers is that local models are generally less capable than frontier models like Claude Opus 4.6 or GPT-5.4 — but for many coding tasks, they are more than sufficient, and the privacy and cost benefits are significant.
Ready to try Remocode?
Start with a 7-day Pro trial — no credit card required. Download now and start coding with AI from anywhere.
Download Remocodefor macOS