Monitoring Token Usage with the Real-Time Audit Dashboard¶

Overview¶

The Real-Time Token Audit dashboard gives you total transparency on AI “fuel” consumption. Monitor exactly how many tokens are being consumed, by which features, and by which models — with hard-limit protection to ensure no surprise invoices.

In this tutorial, you’ll learn:

How to navigate the Token Audit dashboard
How to read token consumption reports
How to set and manage hard-limit protection
How to optimize token usage across your projects
How to interpret the LiteLLM-powered analytics

The Token Audit Dashboard¶

┌──────────────────────────────────────────────────────────────┐
│  Token Audit Dashboard — Project: My SaaS App               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Monthly Budget: $150.00        Used: $87.50 (58%)          │
│  ████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  │
│                                                              │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ Tasks Today │ │ Avg Tokens  │ │ Cost/Task   │           │
│  │     12      │ │   8,450     │ │   $7.29     │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
│                                                              │
│  Model Breakdown:                                            │
│  Claude 4.5   ████████████  42%  ($36.75)                   │
│  GPT-5        ████████████████████  38%  ($33.25)           │
│  Gemini 3 Pro ██████  14%  ($12.25)                         │
│  DeepSeek     ██  6%  ($5.25)                               │
│                                                              │
│  Recent Tasks:                                               │
│  ✅ Create auth endpoint     12,300 tokens  $9.84            │
│  ✅ Fix pagination bug        4,200 tokens  $3.36            │
│  🔄 Build dashboard UI       18,500 tokens  $14.80 (in prog) │
│  ⏳ Write unit tests          6,800 tokens  $5.44 (queued)   │
│                                                              │
│  [Set Hard Limit] [Export Report] [View Details]             │
└──────────────────────────────────────────────────────────────┘

Step 1: Navigate the Dashboard¶

Overview Section¶

Metric	Description
Monthly Budget	Your configured token spending cap
Used	Current consumption and percentage
Remaining	Budget left for the billing cycle
Days Left	Days remaining in the current cycle

Summary Cards¶

Card	Description
Tasks Today	Number of AI tasks submitted today
Avg Tokens/Task	Average token consumption per task
Cost/Task	Average cost per AI task

Model Breakdown¶

Shows token consumption by model:

Visual bar chart: Proportional usage per model
Percentage: Share of total consumption
Dollar amount: Cost attributed to each model

Recent Tasks¶

Lists your most recent AI tasks with:

Status: Completed (✅), In Progress (🔄), Queued (⏳), Failed (❌)
Token count: Total tokens consumed
Cost: Dollar amount spent

Step 2: Set Hard-Limit Protection¶

Configure Hard Limits¶

Click “Set Hard Limit”
Choose your limit type:
Monthly cap: Maximum spend per billing cycle
Daily cap: Maximum spend per day
Per-task cap: Maximum spend per individual task
Set the amount (e.g., $150/month)
Choose the action when limit is reached:
Pause: Stop all AI tasks until you manually resume
Alert: Send notification but continue processing
Click “Save”

Limit Notifications¶

You’ll receive notifications when:

Threshold	Notification
50% used	Informational: “You’ve used 50% of your monthly budget”
75% used	Warning: “You’ve used 75% — consider reviewing task priorities”
90% used	Alert: “You’ve used 90% — approaching your hard limit”
100% reached	Action: Tasks paused (if Pause mode) or final alert (if Alert mode)

Step 3: Read Detailed Reports¶

Task-Level Report¶

Click on any task to see detailed token usage:

Task: "Create user authentication endpoint"
Status: Completed
Duration: 4 minutes

Token Breakdown:
├── Claude 4.5:    2,400 tokens  (Architecture design)
├── GPT-5:         8,200 tokens  (Code implementation)
├── Gemini 3 Pro:  1,800 tokens  (Architecture review)
├── DeepSeek:        900 tokens  (Code refactoring)
└── Total:        13,300 tokens  ($10.64)

Quality Gate:
├── Static Analysis:  PASS (Score: A)
├── Vulnerability Scan: PASS (0 issues)
├── Unit Tests:       PASS (87% coverage)
└── Style Validation: PASS (auto-fixed 2 issues)

Project-Level Report¶

Go to Analytics → Project Report
Select date range
View:
Total tokens consumed across all tasks
Cost breakdown by model, task type, and team member
Trend analysis: Consumption over time
Top consumers: Most expensive tasks and features

Step 4: Optimize Token Usage¶

Identify High-Consumption Areas¶

Look for patterns in your reports:

Pattern	Possible Cause	Solution
One task uses 5x more tokens than average	Overly broad task description	Break into smaller, focused tasks
Claude 4.5 consumption is high	Using Claude for simple tasks	Let AI Factory auto-select optimal model
Refactoring tasks cost too much	DeepSeek not being used	Check model preference settings
Token usage spikes on certain days	Batch task submissions	Spread submissions across the week

Optimization Strategies¶

Write precise task descriptions: Vague descriptions lead to more token consumption
Break large tasks into smaller ones: Easier to estimate and control cost
Use appropriate model preferences: Don’t force expensive models for simple tasks
Review and adjust limits monthly: Based on actual usage patterns
Monitor weekly, not daily: Look for trends, not daily fluctuations

Step 5: Export Reports¶

Export Options¶

Click “Export Report”
Choose format:
CSV: For spreadsheet analysis
PDF: For sharing with stakeholders
JSON: For programmatic processing
Select date range
Choose data to include:
Task details
Model breakdown
Cost attribution
Quality gate results
Download the report

Best Practices¶

Budget Planning¶

Start conservative: Set a lower limit and increase as you understand usage
Review monthly: Adjust based on actual consumption patterns
Plan for spikes: Account for larger features or bug-fixing sprints
Separate projects: Set individual limits per project for better control

Cost Optimization¶

Use the AI Factory’s auto-routing: It selects the most cost-effective model
Batch similar tasks: Reduces context-switching overhead
Review failed tasks: Failed tasks still consume tokens — improve task descriptions
Leverage DeepSeek: For refactoring and cleanup, it’s the most cost-efficient

Team Management¶

Set per-developer limits: If multiple team members submit tasks
Share usage reports: Keep the team informed about consumption
Educate on efficiency: Train team members to write effective task descriptions
Review weekly as a team: Discuss consumption patterns and optimization opportunities

What’s Next?¶

Learn about Private AI Gateway
Explore Understanding the AI Factory
Read about Automated QA & Security Guardrails

Need Help?¶

Documentation: docs.4geeks.io
Community: community.4geeks.io
Support: Available through the console dashboard

Still questions? Ask the community.