Back to Blog
Voice AI

Building a Personal AI Assistant Like Jarvis: A Complete Technical Guide

Parijat Software Team

Voice AI Expert

February 1, 2026
10 min read
#Voice AI#Personal AI Assistant

If you grew up watching Iron Man, you probably dreamed of having your own Jarvis. An AI that just... gets you. One that listens when you need it, stays quiet when you don't, and actually does useful things instead of just answering trivia questions.

We recently built exactly that. A personal voice AI assistant that runs from the desktop, manages Gmail and Google Calendar through natural conversation, searches the web for real-time information, and responds to wake words like the sci-fi assistants we all imagined.

This project started as a prototype to validate features for Marfa, a personal AI assistant you can access over a phone call. But it quickly became something we use every day. We're open-sourcing the code so you can build your own. Or reach out if you want us to build something custom for your use case.

Here's how it all works under the hood.

See it in action:

Watch the demo
Watch the demo


The Tech Stack

Building a production-quality voice AI assistant requires getting several pieces right. Here's the stack we chose and why:

Voice AI Framework: LiveKit Agents

LiveKit is the backbone of this project. It's an open-source framework for building real-time voice and video AI agents, the same infrastructure that powers ChatGPT's Advanced Voice Mode.

What makes LiveKit valuable is how it handles the hard parts of voice AI: turn detection, interruption handling, streaming responses, and WebRTC connectivity. This lets you focus on assistant logic instead of audio engineering.

Speech-to-Text: Deepgram Nova-2

For transcription, we're using Deepgram's Nova-2 model. The latency is crucial for conversational AI. Anything over 300ms and interactions start feeling sluggish. Nova-2 delivers accurate transcription fast enough for real-time conversation.

Large Language Models: Multi-Provider Fallback

The assistant uses a fallback adapter pattern with multiple LLM providers:

  • OpenAI GPT-4.1: Primary model
  • Anthropic Claude Haiku: Fast fallback with caching
  • Google Gemini 2.5 Flash: Additional redundancy

This architecture provides reliability (if one provider has issues, others take over) and lets us experiment with different models for different interaction types.

Text-to-Speech: Cartesia Sonic-3

For voice output, Cartesia's Sonic-3 provides natural-sounding speech with low latency. We've also tested ElevenLabs for projects requiring more voice customization.

Real-Time Web Search: Perplexity API

This is what makes the assistant actually useful day-to-day. When you ask "what's the weather?" or "what happened in the news?", the assistant queries Perplexity's Search API for real-time information. No stale training data, actual current information.

Gmail & Calendar Integration: Composio

Composio handles OAuth and API integration with Google services. Through Composio, the assistant can:

  • Fetch and search emails
  • Send emails and manage drafts
  • Create, update, and delete calendar events
  • Find free time slots for scheduling
  • Read today's agenda

Key Features That Make It Production-Ready

Wake Word Activation

The assistant doesn't continuously process everything it hears. It stays idle until it detects its wake word, configurable to any name you want.

wakeupKeywords = [
    self.assistant_name,
    "hey " + self.assistant_name.lower(),
    "hello " + self.assistant_name.lower(),
]

After ~10 seconds of silence, the assistant enters an "away" state. It's still listening for the wake word, but won't process background audio. This saves compute costs and prevents false activations.

Voice-Controlled Mute

Privacy matters. Say "mute" and the assistant goes completely silent. It won't respond to anything except "unmute" or the wake word. Essential for calls, meetings, or whenever you need it to stay quiet.

Dynamic Tool Loading

This is where the architecture gets interesting. Instead of loading every capability at startup (which bloats the context window and slows responses), the assistant detects user intent and loads only relevant tools on-demand.

# Dynamically load tools based on detected intent
if user_message:
    intent_tools = self.tool_manager.get_tools_for_intent(user_message)

Ask about your calendar? Calendar tools load. Ask about email? Gmail tools load. Tools unused for 3 conversation turns automatically unload. It's like garbage collection for AI capabilities.

The intent detection uses keyword matching against categories:

CALENDAR_KEYWORDS = [
    "calendar", "schedule", "meeting", "appointment",
    "event", "reschedule", "free time", "availability"
]

EMAIL_KEYWORDS = [
    "email", "mail", "inbox", "message", "send",
    "reply", "draft", "unread"
]

Pre-Action Acknowledgment

When executing time-consuming operations (creating events, sending emails, searching), the assistant acknowledges immediately with natural phrases like "Let me handle that" or "Working on it" before the action completes. Small detail, but it makes interactions feel responsive.


Architecture Overview

voice_ai_assistant/
├── src/
│   ├── agent.py                    # Core assistant logic
│   ├── services/
│   │   └── perplexity_service.py   # Web search integration
│   ├── tools/
│   │   ├── web_search_tools.py     # Perplexity wrapper
│   │   ├── composio_tools_dynamic.py   # Gmail/Calendar tools
│   │   └── dynamic_tool_manager.py     # Intent-based loading
│   └── utils/
│       └── instructions.py         # System prompts
├── .env.example
└── pyproject.toml

The Assistant class extends LiveKit's Agent base class, overriding key methods:

  • stt_node: Intercepts transcription to handle wake word detection and mute state
  • llm_node: Manages dynamic tool loading based on conversation intent
  • on_user_turn_completed: Cleans up unused tools after each turn

Getting Started

Prerequisites

  • Python 3.10+
  • LiveKit account (cloud or self-hosted)
  • API keys: Deepgram, OpenAI/Anthropic/Google, Cartesia or ElevenLabs, Perplexity
  • Composio account with Gmail and Google Calendar connected

Installation

git clone https://github.com/MarfaAI/voice_ai_assistant.git
cd voice_ai_assistant
uv sync

Configure .env.local based on .env.example:

LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
COMPOSIO_API_KEY=your_composio_key
PERPLEXITY_API_KEY=your_perplexity_key

Running

python -m src.agent dev

You can find more details on project setup in the repo's readme file.


Example Interactions

Calendar:

  • "What's on my calendar today?"
  • "Schedule a meeting with the team tomorrow at 2pm"
  • "When am I free this week?"
  • "Cancel my 3 o'clock"

Email:

  • "Any new emails?"
  • "Read my latest email from Sarah"
  • "Send an email to the team about the project update"
  • "Do I have anything from legal?"

Real-Time Information:

  • "What's the weather?"
  • "Latest news about AI"
  • "Current Bitcoin price"
  • "Who won the Lakers game?"

Extending to Phone Calls

This implementation runs as a console app, but adding phone support is straightforward with LiveKit's telephony stack.

The path:

  1. Provision a phone number (Twilio, Vonage, etc.)
  2. Configure SIP trunking with LiveKit
  3. Route incoming calls to your agent

Your personal Jarvis becomes reachable from any phone. We haven't included telephony in the open-source version, but we've implemented this for client projects. Reach out if you need phone-enabled voice AI.


Want Something Like This For Your Business?

This open-source project demonstrates what's possible with modern voice AI infrastructure. But every business has different needs:

  • Custom integrations: CRM, ERP, internal tools, databases
  • Industry-specific knowledge: Healthcare, legal, finance, real estate
  • Multi-language support: Agents that work across languages
  • Phone/SMS channels: Customer-facing voice bots
  • Compliance requirements: HIPAA, SOC2, data residency

We build production voice AI systems. From prototype to deployment, we handle the complexity so you get a solution that actually works.

Check out Marfa to try a personal AI assistant over the phone, or get in touch to discuss your project.


Code: github.com/MarfaAI/voice_ai_assistant