# LLMFlow

> Local-first LLM observability tool. Track costs, tokens, and latency for OpenAI, Anthropic, Gemini, and 10+ providers.
> One command to run. Zero config. No signup required.

LLMFlow is a **proxy + dashboard** for LLM API calls. It sits between your application and the LLM provider, logging
every request with cost calculations, token counts, and latency metrics. Designed for solo developers, hobbyists, and
anyone who wants visibility into their AI usage without paying for SaaS observability tools.

## Quick Start

```bash
# Start LLMFlow (choose one)
npx llmflow
docker run -p 3000:3000 -p 8080:8080 helgesverre/llmflow
git clone https://github.com/HelgeSverre/llmflow && cd llmflow && npm install && npm start

# Point your SDK at the proxy
# Python: client = OpenAI(base_url="http://localhost:8080/v1")
# JS: const client = new OpenAI({ baseURL: "http://localhost:8080/v1" })

# View dashboard
open http://localhost:3000
```

## Architecture Overview

```
Your Application
       │
       ▼
┌──────────────────┐
│  LLMFlow Proxy   │ ← Port 8080 (configurable)
│  - Intercepts    │
│  - Logs request  │
│  - Forwards      │
│  - Logs response │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐      ┌──────────────────┐
│   LLM Provider   │      │ LLMFlow Dashboard│ ← Port 3000
│ (OpenAI, etc.)   │      │ - View traces    │
└──────────────────┘      │ - Search/filter  │
                          │ - Cost analytics │
                          │ - OTLP ingestion │
                          └──────────────────┘
                                   │
                                   ▼
                          ┌──────────────────┐
                          │     SQLite       │
                          │  ~/.llmflow/     │
                          │    data.db       │
                          └──────────────────┘
```

## Integration Methods

### Method 1: Proxy Mode (Recommended)

Point your LLM SDK's base URL at LLMFlow. Works with any OpenAI-compatible client.

| Provider | Base URL |
|----------|----------|
| OpenAI | `http://localhost:8080/v1` |
| Anthropic | `http://localhost:8080/anthropic/v1` |
| Gemini | `http://localhost:8080/gemini/v1` |
| Ollama | `http://localhost:8080/ollama/v1` |
| Groq | `http://localhost:8080/groq/v1` |
| Mistral | `http://localhost:8080/mistral/v1` |
| Azure OpenAI | `http://localhost:8080/azure/v1` |
| Cohere | `http://localhost:8080/cohere/v1` |
| Together | `http://localhost:8080/together/v1` |
| OpenRouter | `http://localhost:8080/openrouter/v1` |
| Perplexity | `http://localhost:8080/perplexity/v1` |

**Python example:**
```python
from openai import OpenAI

# Before: client = OpenAI()
# After:
client = OpenAI(base_url="http://localhost:8080/v1")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
```

**JavaScript example:**
```javascript
import OpenAI from 'openai';

const client = new OpenAI({ baseURL: 'http://localhost:8080/v1' });

const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'Hello' }]
});
```

**Provider header override:**
```bash
# Use X-LLMFlow-Provider header to override provider detection
curl http://localhost:8080/v1/chat/completions \
  -H "X-LLMFlow-Provider: groq" \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -d '{"model":"llama-3.1-8b-instant","messages":[{"role":"user","content":"Hi"}]}'
```

### Method 2: OpenTelemetry / OTLP

Send traces from LangChain, LlamaIndex, or any OTLP-instrumented application.

**OTLP Endpoints:**
- Traces: `POST http://localhost:3000/v1/traces`
- Logs: `POST http://localhost:3000/v1/logs`
- Metrics: `POST http://localhost:3000/v1/metrics`

**Python with OpenLLMetry:**
```python
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

exporter = OTLPSpanExporter(endpoint="http://localhost:3000/v1/traces")
provider.add_span_processor(BatchSpanProcessor(exporter))
```

**JavaScript with OpenTelemetry:**
```javascript
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const exporter = new OTLPTraceExporter({
    url: 'http://localhost:3000/v1/traces'
});
```

### Method 3: Passthrough Mode (AI CLI Tools)

For tools that use native API formats (not OpenAI-compatible), use passthrough routes.

| Tool | Configuration |
|------|---------------|
| Claude Code | `export ANTHROPIC_BASE_URL=http://localhost:8080/passthrough/anthropic` |
| Aider | `aider --openai-api-base http://localhost:8080/v1` |
| Codex CLI | Set `endpoint = "http://localhost:3000/v1/logs"` in `~/.codex/config.toml` |

**Passthrough routes:**
- `/passthrough/anthropic/*` → api.anthropic.com
- `/passthrough/gemini/*` → generativelanguage.googleapis.com
- `/passthrough/openai/*` → api.openai.com
- `/passthrough/helicone/*` → oai.helicone.ai

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `PROXY_PORT` | `8080` | Proxy server port |
| `DASHBOARD_PORT` | `3000` | Dashboard and OTLP ingestion port |
| `DATA_DIR` | `~/.llmflow` | SQLite database directory |
| `MAX_TRACES` | `10000` | Maximum traces to retain (older pruned) |
| `VERBOSE` | `0` | Enable verbose logging (0 or 1) |

### Provider API Keys

Set these if you want LLMFlow to forward requests (otherwise pass in Authorization header):

| Variable | Provider |
|----------|----------|
| `OPENAI_API_KEY` | OpenAI |
| `ANTHROPIC_API_KEY` | Anthropic |
| `GOOGLE_API_KEY` or `GEMINI_API_KEY` | Google Gemini |
| `GROQ_API_KEY` | Groq |
| `MISTRAL_API_KEY` | Mistral |
| `COHERE_API_KEY` | Cohere |
| `TOGETHER_API_KEY` | Together AI |
| `OPENROUTER_API_KEY` | OpenRouter |
| `PERPLEXITY_API_KEY` | Perplexity |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI |
| `AZURE_OPENAI_RESOURCE` | Azure OpenAI resource name |

### OTLP Export (Optional)

Forward traces to external observability backends:

| Variable | Description |
|----------|-------------|
| `OTLP_EXPORT_ENDPOINT` | Export endpoint (e.g., `http://localhost:4318/v1/traces`) |
| `OTLP_EXPORT_HEADERS` | Auth headers (e.g., `Authorization=Bearer xxx`) |
| `OTLP_EXPORT_BATCH_SIZE` | Batch size (default: 100) |
| `OTLP_EXPORT_FLUSH_INTERVAL` | Flush interval in ms (default: 5000) |

Supported backends: Jaeger, Phoenix (Arize), Langfuse, Opik (Comet), Grafana Tempo.

## API Endpoints

### Dashboard API

| Endpoint | Description |
|----------|-------------|
| `GET /api/traces` | List traces with filters |
| `GET /api/traces/:id` | Get trace details |
| `GET /api/traces/:id/tree` | Get hierarchical span tree |
| `GET /api/traces/export` | Export traces as JSON/JSONL |
| `GET /api/logs` | List OTLP logs |
| `GET /api/metrics` | List OTLP metrics |
| `GET /api/stats` | Aggregate statistics |
| `GET /api/health` | Health check |
| `GET /api/health/providers` | Check provider API key validity |
| `GET /api/analytics/token-trends` | Token usage over time |
| `GET /api/analytics/cost-by-tool` | Cost breakdown by tool |
| `GET /api/analytics/cost-by-model` | Cost breakdown by model |

### Query Parameters

| Parameter | Description |
|-----------|-------------|
| `q` | Full-text search |
| `model` | Filter by model name |
| `status` | Filter by status (`success` or `error`) |
| `date_from` | Start timestamp (milliseconds) |
| `date_to` | End timestamp (milliseconds) |
| `limit` | Results per page (default: 50) |
| `offset` | Pagination offset |

### Custom Tags

Add tags to traces via header for filtering:

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "X-LLMFlow-Tag: user:alice, env:prod, feature:chat" \
  -d '{"model":"gpt-4o-mini","messages":[...]}'
```

## Span Types

LLMFlow recognizes these span types from OpenTelemetry semantic conventions:

| Type | Description | Use Case |
|------|-------------|----------|
| `llm` | LLM API call | Chat completions, embeddings |
| `trace` | Root span | Workflow entry point |
| `agent` | Agent execution | ReAct loops, tool-using agents |
| `chain` | Chain step | LangChain chains, pipelines |
| `tool` | Tool call | Function calls, API calls |
| `retrieval` | Vector search | RAG retrieval, document lookup |
| `embedding` | Embedding generation | Text to vector |
| `custom` | Custom span | Application-specific |

## Workflow Examples

### Workflow 1: Track OpenAI Costs in Python App

```bash
# Terminal 1: Start LLMFlow
npx llmflow
```

```python
# app.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1")

# All calls now tracked
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# View at http://localhost:3000
```

### Workflow 2: Track LangChain with OpenLLMetry

```bash
# Terminal 1: Start LLMFlow
npx llmflow
```

```python
# langchain_app.py
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from traceloop.sdk import Traceloop

# Initialize with LLMFlow as exporter
Traceloop.init(
    exporter=OTLPSpanExporter(endpoint="http://localhost:3000/v1/traces")
)

# Now use LangChain normally - traces sent to LLMFlow
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini")
chain = ChatPromptTemplate.from_template("Tell me about {topic}") | llm
result = chain.invoke({"topic": "Python"})
```

### Workflow 3: Track Claude Code Usage

```bash
# Terminal 1: Start LLMFlow
npx llmflow

# Terminal 2: Configure and run Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8080/passthrough/anthropic
claude
```

All Claude Code requests now logged in LLMFlow dashboard.

### Workflow 4: Multiple Providers in One App

```python
from openai import OpenAI

# OpenAI
openai_client = OpenAI(base_url="http://localhost:8080/v1")

# Anthropic (via OpenAI-compatible endpoint)
anthropic_client = OpenAI(
    base_url="http://localhost:8080/anthropic/v1",
    api_key="your-anthropic-key"
)

# Ollama (local)
ollama_client = OpenAI(
    base_url="http://localhost:8080/ollama/v1",
    api_key="not-needed"
)

# All three tracked in same dashboard
```

### Workflow 5: Export Traces for Analysis

```bash
# Export last 1000 traces as JSON
curl "http://localhost:3000/api/traces/export?limit=1000" > traces.json

# Export as JSONL
curl "http://localhost:3000/api/traces/export?format=jsonl" > traces.jsonl

# Export filtered by model
curl "http://localhost:3000/api/traces/export?model=gpt-4o" > gpt4_traces.json

# Export filtered by date range
curl "http://localhost:3000/api/traces/export?date_from=1700000000000&date_to=1700100000000" > traces.json
```

### Workflow 6: Check Provider Health

```bash
# Check if all configured API keys are valid
curl http://localhost:3000/api/health/providers | jq

# Response:
# {
#   "summary": "3/4 providers healthy",
#   "providers": {
#     "openai": { "status": "ok", "latency_ms": 245 },
#     "anthropic": { "status": "ok", "latency_ms": 312 },
#     "gemini": { "status": "unconfigured" },
#     "ollama": { "status": "ok", "latency_ms": 12 }
#   }
# }
```

## Troubleshooting

### Connection Refused

```bash
# Check if LLMFlow is running
curl http://localhost:3000/api/health

# Check if ports are in use
lsof -i :3000
lsof -i :8080
```

### No Traces Appearing

1. Verify SDK is pointing to proxy: `base_url="http://localhost:8080/v1"`
2. Check proxy logs: `VERBOSE=1 npx llmflow`
3. Ensure requests complete (don't cancel mid-stream)

### Provider Authentication Errors

1. Set API key in environment: `export OPENAI_API_KEY=sk-...`
2. Or pass in Authorization header with each request
3. Check health: `curl http://localhost:3000/api/health/providers`

### Database Issues

```bash
# Database location
ls ~/.llmflow/data.db

# Reset database (loses all data)
rm ~/.llmflow/data.db
npx llmflow
```

## Documentation

- [README.md](https://github.com/HelgeSverre/llmflow/blob/main/README.md): Quick start and overview
- [docs/guides/ai-cli-tools.md](https://github.com/HelgeSverre/llmflow/blob/main/docs/guides/ai-cli-tools.md): Claude Code, Codex, Aider setup
- [docs/guides/observability-backends.md](https://github.com/HelgeSverre/llmflow/blob/main/docs/guides/observability-backends.md): Jaeger, Langfuse, Phoenix export

## Source Code

- [server.js](https://github.com/HelgeSverre/llmflow/blob/main/server.js): Main proxy and dashboard server
- [db.js](https://github.com/HelgeSverre/llmflow/blob/main/db.js): SQLite database operations
- [pricing.js](https://github.com/HelgeSverre/llmflow/blob/main/pricing.js): Cost calculation for 2000+ models
- [providers/](https://github.com/HelgeSverre/llmflow/tree/main/providers): Provider implementations
- [otlp.js](https://github.com/HelgeSverre/llmflow/blob/main/otlp.js): OTLP trace ingestion
- [otlp-logs.js](https://github.com/HelgeSverre/llmflow/blob/main/otlp-logs.js): OTLP log ingestion
- [otlp-metrics.js](https://github.com/HelgeSverre/llmflow/blob/main/otlp-metrics.js): OTLP metrics ingestion
- [otlp-export.js](https://github.com/HelgeSverre/llmflow/blob/main/otlp-export.js): Export to external backends
- [public/](https://github.com/HelgeSverre/llmflow/tree/main/public): Dashboard frontend

## Distribution

- **npm**: `npx llmflow` or `npm install -g llmflow`
- **Docker**: `docker run -p 3000:3000 -p 8080:8080 helgesverre/llmflow`
- **Source**: `git clone https://github.com/HelgeSverre/llmflow && npm install && npm start`

## License

MIT License - https://github.com/HelgeSverre/llmflow/blob/main/LICENSE