generated from nhcarrigan/template
feat: add events and talks pages
Security Scan and Upload / Security & DefectDojo Upload (push) Successful in 1m1s
Security Scan and Upload / Security & DefectDojo Upload (push) Successful in 1m1s
This commit is contained in:
@@ -0,0 +1,522 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<title>Berkeley AI Hackathon: Voice AI Agent Workshop</title>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<meta
|
||||
name="description"
|
||||
content="Build a fully functional voice AI agent from scratch. The full workshop guide: concept, architecture, setup, and three hands-on modifications."
|
||||
/>
|
||||
<script
|
||||
src="https://cdn.nhcarrigan.com/headers/index.js"
|
||||
async
|
||||
defer
|
||||
></script>
|
||||
<style>
|
||||
.talk-meta {
|
||||
font-size: 0.9rem;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.5em 1.5em;
|
||||
margin-bottom: 1.5em;
|
||||
}
|
||||
|
||||
.talk-meta span {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 0.35em;
|
||||
}
|
||||
|
||||
.talk-links {
|
||||
margin-bottom: 1.5em;
|
||||
}
|
||||
|
||||
.talk-links a {
|
||||
margin-right: 1em;
|
||||
}
|
||||
|
||||
hr {
|
||||
border: 1px solid var(--witch-plum);
|
||||
margin: 2em 0;
|
||||
}
|
||||
|
||||
.is-dark hr {
|
||||
border-color: var(--witch-rose);
|
||||
}
|
||||
|
||||
section {
|
||||
margin-bottom: 2em;
|
||||
}
|
||||
|
||||
blockquote {
|
||||
border-left: 3px solid var(--witch-rose);
|
||||
margin: 1.5em 0;
|
||||
padding: 0.5em 0 0.5em 1.25em;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.is-dark blockquote {
|
||||
border-left-color: var(--witch-mauve);
|
||||
}
|
||||
|
||||
.callout {
|
||||
background: rgba(212, 165, 199, 0.08);
|
||||
border: 1px solid var(--witch-plum);
|
||||
border-radius: 10px;
|
||||
padding: 1.25em 1.5em;
|
||||
margin: 1.25em 0;
|
||||
}
|
||||
|
||||
.is-dark .callout {
|
||||
border-color: var(--witch-rose);
|
||||
}
|
||||
|
||||
.callout p:last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
.loop-diagram {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.5em;
|
||||
margin: 1.5em 0;
|
||||
font-size: 1rem;
|
||||
}
|
||||
|
||||
.loop-step {
|
||||
background: rgba(212, 165, 199, 0.12);
|
||||
border: 1px solid var(--witch-plum);
|
||||
border-radius: 8px;
|
||||
padding: 0.6em 1em;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.is-dark .loop-step {
|
||||
border-color: var(--witch-rose);
|
||||
}
|
||||
|
||||
.loop-arrow {
|
||||
font-size: 1.2rem;
|
||||
opacity: 0.6;
|
||||
}
|
||||
|
||||
.step-list {
|
||||
counter-reset: steps;
|
||||
list-style: none;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.step-list li {
|
||||
counter-increment: steps;
|
||||
display: flex;
|
||||
gap: 1em;
|
||||
align-items: flex-start;
|
||||
margin-bottom: 1.25em;
|
||||
}
|
||||
|
||||
.step-list li::before {
|
||||
content: counter(steps);
|
||||
background: var(--witch-plum);
|
||||
color: var(--witch-moon);
|
||||
border-radius: 50%;
|
||||
min-width: 1.8em;
|
||||
height: 1.8em;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
font-size: 0.9rem;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.is-dark .step-list li::before {
|
||||
background: var(--witch-rose);
|
||||
color: var(--witch-black);
|
||||
}
|
||||
|
||||
.mod-block {
|
||||
background: rgba(212, 165, 199, 0.08);
|
||||
border: 1px solid var(--witch-plum);
|
||||
border-radius: 10px;
|
||||
padding: 1.25em 1.5em;
|
||||
margin: 1.25em 0;
|
||||
}
|
||||
|
||||
.is-dark .mod-block {
|
||||
border-color: var(--witch-rose);
|
||||
}
|
||||
|
||||
.mod-block h3 {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
.mod-block p:last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
code {
|
||||
background: rgba(0, 0, 0, 0.08);
|
||||
border-radius: 4px;
|
||||
padding: 0.15em 0.4em;
|
||||
font-family: monospace;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.is-dark code {
|
||||
background: rgba(255, 255, 255, 0.08);
|
||||
}
|
||||
|
||||
pre {
|
||||
background: rgba(0, 0, 0, 0.08);
|
||||
border-radius: 8px;
|
||||
padding: 1em 1.25em;
|
||||
overflow-x: auto;
|
||||
font-family: monospace;
|
||||
font-size: 0.9em;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.is-dark pre {
|
||||
background: rgba(255, 255, 255, 0.05);
|
||||
}
|
||||
|
||||
.back-link {
|
||||
font-size: 0.9rem;
|
||||
display: block;
|
||||
margin-top: 2em;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<main>
|
||||
<h1>Voice AI Agent Workshop</h1>
|
||||
<div class="talk-meta">
|
||||
<span>
|
||||
<i class="fas fa-calendar-alt" aria-hidden="true"></i>
|
||||
20–21 June 2026
|
||||
</span>
|
||||
<span>
|
||||
<i class="fas fa-map-marker-alt" aria-hidden="true"></i>
|
||||
UC Berkeley Campus, Berkeley, CA
|
||||
</span>
|
||||
<span>
|
||||
<i class="fas fa-user-tag" aria-hidden="true"></i>
|
||||
Speaker & Sponsor (Berkeley AI Hackathon)
|
||||
</span>
|
||||
</div>
|
||||
<div class="talk-links">
|
||||
<a href="https://ai.hackberkeley.org" target="_blank" rel="noopener noreferrer">
|
||||
<i class="fas fa-external-link-alt" aria-hidden="true"></i>
|
||||
Berkeley AI Hackathon
|
||||
</a>
|
||||
<a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">
|
||||
<i class="fas fa-external-link-alt" aria-hidden="true"></i>
|
||||
Deepgram Console ($200 free credit)
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
Voice AI is having a moment — and it's more accessible than you might think. In this
|
||||
workshop, we'll build a fully functional voice AI agent from scratch, using real-time
|
||||
speech-to-text, a large language model for reasoning, and text-to-speech to talk back. By
|
||||
the end, you'll have a working agent on your laptop that you can drop straight into a project.
|
||||
</p>
|
||||
|
||||
<blockquote>
|
||||
Talk to your hackathon project in 40 minutes.
|
||||
</blockquote>
|
||||
|
||||
<p>No prior voice AI experience needed. If you've worked with an API before, you're good to go.</p>
|
||||
|
||||
<hr />
|
||||
|
||||
<section>
|
||||
<h2>How a Voice Agent Works</h2>
|
||||
<p>
|
||||
A voice agent is three pieces wired together in a loop. Understanding this loop is the whole
|
||||
conceptual foundation. Everything else is implementation detail.
|
||||
</p>
|
||||
<div class="loop-diagram">
|
||||
<div class="loop-step">
|
||||
<strong>Ear</strong><br />
|
||||
Speech-to-Text
|
||||
</div>
|
||||
<span class="loop-arrow">→</span>
|
||||
<div class="loop-step">
|
||||
<strong>Brain</strong><br />
|
||||
LLM
|
||||
</div>
|
||||
<span class="loop-arrow">→</span>
|
||||
<div class="loop-step">
|
||||
<strong>Mouth</strong><br />
|
||||
Text-to-Speech
|
||||
</div>
|
||||
<span class="loop-arrow" style="transform: rotate(90deg); display: block;">→</span>
|
||||
</div>
|
||||
<p>
|
||||
<strong>The ear</strong> captures your audio and transcribes it in real time using
|
||||
speech-to-text. Latency here is everything: if your STT is slow, the whole agent feels
|
||||
sluggish. Deepgram's STT runs at sub-300ms end-to-end, which is fast enough to feel like
|
||||
a real conversation.
|
||||
</p>
|
||||
<p>
|
||||
<strong>The brain</strong> receives the transcript and decides what to say. This is your
|
||||
LLM: it reasons over the conversation history and your system prompt, generates a response,
|
||||
and can call any functions you've given it access to. It can look things up, run code,
|
||||
fetch data — anything you wire in.
|
||||
</p>
|
||||
<p>
|
||||
<strong>The mouth</strong> takes the LLM's text response and streams it back as audio. Fast
|
||||
streaming matters here too: you want audio to start playing before the full response is
|
||||
generated, or the agent feels like it's thinking too hard.
|
||||
</p>
|
||||
<p>
|
||||
With Deepgram, all three pieces run over a single WebSocket connection. You're not juggling
|
||||
three separate APIs — it's one socket, one loop, about 80 lines of code to start.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h2>Design Decisions That Actually Matter</h2>
|
||||
<p>
|
||||
Most tutorials skip these. They're the difference between an agent that feels like a demo
|
||||
and one that feels like a tool.
|
||||
</p>
|
||||
|
||||
<h3>Interruption Handling</h3>
|
||||
<p>
|
||||
In a real conversation, you don't wait for the other person to finish before you start
|
||||
talking. Your agent shouldn't either. When the user starts speaking mid-response, the agent
|
||||
needs to stop its output and listen. Getting this wrong is the fastest way to make an agent
|
||||
feel robotic.
|
||||
</p>
|
||||
<p>
|
||||
Deepgram's Voice Agent API handles this for you. The WebSocket connection detects voice
|
||||
activity on both ends and manages the interrupt logic. You don't have to implement it
|
||||
yourself — just don't override it.
|
||||
</p>
|
||||
|
||||
<h3>Conversation State</h3>
|
||||
<p>
|
||||
Every turn in the conversation needs to be threaded correctly. The LLM needs the full
|
||||
context of what's been said so far — by both sides — to give coherent responses. This
|
||||
means maintaining a message history and passing it in with every LLM call.
|
||||
</p>
|
||||
<p>
|
||||
Keep your context window in mind. Very long conversations will eventually push older turns
|
||||
out of the context. A simple solution: keep the system prompt, the last N turns, and let
|
||||
the rest roll off. The agent won't remember everything, but the conversation will stay
|
||||
coherent.
|
||||
</p>
|
||||
|
||||
<h3>Latency</h3>
|
||||
<p>
|
||||
The number that matters is end-to-end: from when the user stops speaking to when audio
|
||||
starts playing back. Under 500ms feels conversational. Over 1 second feels like a phone
|
||||
call with bad signal.
|
||||
</p>
|
||||
<p>
|
||||
Three places latency hides: STT processing time, LLM first-token time, and TTS start time.
|
||||
Streaming helps with all three. Start playing audio as soon as the first TTS chunk arrives.
|
||||
Choose an LLM model that prioritises speed over capability for conversational use — a
|
||||
smaller, faster model is often better here than a smarter, slower one.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<hr />
|
||||
|
||||
<section>
|
||||
<h2>Getting Set Up</h2>
|
||||
|
||||
<div class="callout">
|
||||
<p><strong>Before you start:</strong></p>
|
||||
<ul>
|
||||
<li>Node 20+ or Python 3.11+ installed</li>
|
||||
<li>A code editor</li>
|
||||
<li>A <a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">free Deepgram account</a> — includes $200 credit on sign-up, no credit card needed</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<p>Five steps to a running agent:</p>
|
||||
<ol class="step-list">
|
||||
<li>
|
||||
<div>
|
||||
Sign up at <a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">console.deepgram.com</a>.
|
||||
Your account comes with $200 in free credit — more than enough for this workshop and a weekend of hacking.
|
||||
</div>
|
||||
</li>
|
||||
<li>
|
||||
<div>
|
||||
Grab your API key from the console. It lives under API Keys in the left sidebar. Copy it somewhere safe.
|
||||
</div>
|
||||
</li>
|
||||
<li>
|
||||
<div>
|
||||
Clone the starter repo. We'll walk through this together in the workshop, but the pattern is:
|
||||
<pre>git clone <starter-repo-url>
|
||||
cd <repo-name></pre>
|
||||
</div>
|
||||
</li>
|
||||
<li>
|
||||
<div>
|
||||
Add your API key to the environment. Create a <code>.env</code> file in the project root:
|
||||
<pre>DEEPGRAM_API_KEY=your_key_here</pre>
|
||||
</div>
|
||||
</li>
|
||||
<li>
|
||||
<div>
|
||||
Install dependencies and run it:
|
||||
<pre># Node
|
||||
npm install && npm start
|
||||
|
||||
# Python
|
||||
uv venv && uv pip install -r requirements.txt
|
||||
python main.py</pre>
|
||||
</div>
|
||||
</li>
|
||||
</ol>
|
||||
<p>
|
||||
If it's working, you'll see a connection message in the terminal and the agent will greet
|
||||
you when you speak. If it's not working, come find us at the booth.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<hr />
|
||||
|
||||
<section>
|
||||
<h2>Make It Yours — Three Modifications</h2>
|
||||
<p>
|
||||
The starter app works, but it's generic. These three modifications are where the session
|
||||
becomes yours. Each one takes about five to seven minutes. Do them in order, or skip to
|
||||
whichever interests you most.
|
||||
</p>
|
||||
|
||||
<div class="mod-block">
|
||||
<h3>Modification A: Change the Personality</h3>
|
||||
<p>
|
||||
The agent's system prompt is what defines who it is. Find the <code>system_prompt</code>
|
||||
variable in the starter code — it'll look something like this:
|
||||
</p>
|
||||
<pre>system_prompt = "You are a helpful assistant."</pre>
|
||||
<p>
|
||||
Replace it with something more specific. Give the agent a name, a role, a point of view.
|
||||
For example:
|
||||
</p>
|
||||
<pre>system_prompt = """You are Alf, a friendly AI assistant at a hackathon.
|
||||
You give encouragement, suggest project ideas, and answer questions
|
||||
about voice AI. You're enthusiastic but concise - people are busy building."""</pre>
|
||||
<p>
|
||||
Restart the agent and talk to it. The change is immediate. The system prompt is the
|
||||
entire personality — there's no training, no fine-tuning. You just wrote it.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="mod-block">
|
||||
<h3>Modification B: Swap the Voice</h3>
|
||||
<p>
|
||||
Deepgram's TTS has a full voice catalogue. Find the voice configuration in the starter
|
||||
code — it'll be a single string like <code>"aura-asteria-en"</code>. Change it to any
|
||||
other voice from the catalogue.
|
||||
</p>
|
||||
<p>
|
||||
A few to try:
|
||||
</p>
|
||||
<ul>
|
||||
<li><code>aura-asteria-en</code> — warm, conversational</li>
|
||||
<li><code>aura-orion-en</code> — deep, authoritative</li>
|
||||
<li><code>aura-luna-en</code> — clear, neutral</li>
|
||||
<li><code>aura-zeus-en</code> — bold, energetic</li>
|
||||
</ul>
|
||||
<p>
|
||||
Restart and talk to your agent. It's a one-line change and the agent sounds entirely
|
||||
different. This is the modification that gets the strongest reaction.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="mod-block">
|
||||
<h3>Modification C: Add a Custom Function</h3>
|
||||
<p>
|
||||
This is the one that makes you realise you can hook it to anything. Function calling lets
|
||||
the agent invoke Python (or JS) functions you write, then incorporate the result into its
|
||||
response naturally.
|
||||
</p>
|
||||
<p>
|
||||
Here's a simple example: a function that returns a random project idea when the agent
|
||||
is asked for inspiration.
|
||||
</p>
|
||||
<pre>import random
|
||||
|
||||
def get_project_idea():
|
||||
ideas = [
|
||||
"A voice-controlled to-do list that reads back your tasks",
|
||||
"An AI study buddy that quizzes you out loud",
|
||||
"A real-time translator that speaks back in the target language",
|
||||
"A voice journalling app that summarises your entries",
|
||||
]
|
||||
return random.choice(ideas)</pre>
|
||||
<p>
|
||||
Register this function with the agent and tell the LLM when to use it via the system
|
||||
prompt: <em>"When asked for a project idea, call get_project_idea() and share the result."</em>
|
||||
</p>
|
||||
<p>
|
||||
Now ask your agent for a project idea. Watch it call the function and weave the result
|
||||
into a natural spoken response. This is the "aha" moment: the agent can call your own
|
||||
code, your own APIs, your own data. The voice interface is just the front door.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<hr />
|
||||
|
||||
<section>
|
||||
<h2>What You Can Build From Here</h2>
|
||||
<p>
|
||||
Now that you have a working, personalised, function-capable voice agent, here's what that
|
||||
unlocks for your hackathon project:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
<strong>Accessibility layer</strong> — add voice input and output to any existing
|
||||
interface. Users who can't type or read small text get a completely different experience.
|
||||
</li>
|
||||
<li>
|
||||
<strong>In-game NPC</strong> — drop the agent into a game as a character that actually
|
||||
talks back. Hook the function calling to your game state so it knows what's happening.
|
||||
</li>
|
||||
<li>
|
||||
<strong>Voice-controlled developer tool</strong> — talk to your build process, your
|
||||
deploy pipeline, your monitoring dashboard. Voice is an unusually good interface for
|
||||
things you want to do hands-free.
|
||||
</li>
|
||||
<li>
|
||||
<strong>Multilingual support</strong> — Deepgram's STT handles dozens of languages.
|
||||
The LLM can respond in whatever language the user speaks. Global voice interface, almost for free.
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h2>Get Help</h2>
|
||||
<p>Building something? Stuck on something? Here's where to find us:</p>
|
||||
<ul>
|
||||
<li><strong>At the event:</strong> the Deepgram booth — we're here all weekend</li>
|
||||
<li><strong>Docs:</strong> <a href="https://developers.deepgram.com" target="_blank" rel="noopener noreferrer">developers.deepgram.com</a></li>
|
||||
<li><strong>Community:</strong> <a href="https://deepgram.com/discord" target="_blank" rel="noopener noreferrer">Deepgram's Discord</a></li>
|
||||
</ul>
|
||||
<p>
|
||||
The Deepgram challenge prize this weekend goes to the team that builds the most creative
|
||||
voice-powered experience. Come say hi, show us what you're building, and let us know if you
|
||||
want feedback on your voice integration before judging.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<hr />
|
||||
<a class="back-link" href="/">
|
||||
<i class="fas fa-arrow-left" aria-hidden="true"></i>
|
||||
Back to all talks
|
||||
</a>
|
||||
</main>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user