feat: add events and talks pages

2026-06-24 20:20:11 -07:00
parent 995b523afd
commit 97fce9945d
6 changed files with 1792 additions and 0 deletions
@@ -0,0 +1,522 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <title>Berkeley AI Hackathon: Voice AI Agent Workshop</title>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <meta
+      name="description"
+      content="Build a fully functional voice AI agent from scratch. The full workshop guide: concept, architecture, setup, and three hands-on modifications."
+    />
+    <script
+      src="https://cdn.nhcarrigan.com/headers/index.js"
+      async
+      defer
+    ></script>
+    <style>
+      .talk-meta {
+        font-size: 0.9rem;
+        display: flex;
+        flex-wrap: wrap;
+        gap: 0.5em 1.5em;
+        margin-bottom: 1.5em;
+      }
+
+      .talk-meta span {
+        display: inline-flex;
+        align-items: center;
+        gap: 0.35em;
+      }
+
+      .talk-links {
+        margin-bottom: 1.5em;
+      }
+
+      .talk-links a {
+        margin-right: 1em;
+      }
+
+      hr {
+        border: 1px solid var(--witch-plum);
+        margin: 2em 0;
+      }
+
+      .is-dark hr {
+        border-color: var(--witch-rose);
+      }
+
+      section {
+        margin-bottom: 2em;
+      }
+
+      blockquote {
+        border-left: 3px solid var(--witch-rose);
+        margin: 1.5em 0;
+        padding: 0.5em 0 0.5em 1.25em;
+        font-style: italic;
+      }
+
+      .is-dark blockquote {
+        border-left-color: var(--witch-mauve);
+      }
+
+      .callout {
+        background: rgba(212, 165, 199, 0.08);
+        border: 1px solid var(--witch-plum);
+        border-radius: 10px;
+        padding: 1.25em 1.5em;
+        margin: 1.25em 0;
+      }
+
+      .is-dark .callout {
+        border-color: var(--witch-rose);
+      }
+
+      .callout p:last-child {
+        margin-bottom: 0;
+      }
+
+      .loop-diagram {
+        display: flex;
+        align-items: center;
+        justify-content: center;
+        flex-wrap: wrap;
+        gap: 0.5em;
+        margin: 1.5em 0;
+        font-size: 1rem;
+      }
+
+      .loop-step {
+        background: rgba(212, 165, 199, 0.12);
+        border: 1px solid var(--witch-plum);
+        border-radius: 8px;
+        padding: 0.6em 1em;
+        text-align: center;
+      }
+
+      .is-dark .loop-step {
+        border-color: var(--witch-rose);
+      }
+
+      .loop-arrow {
+        font-size: 1.2rem;
+        opacity: 0.6;
+      }
+
+      .step-list {
+        counter-reset: steps;
+        list-style: none;
+        padding: 0;
+      }
+
+      .step-list li {
+        counter-increment: steps;
+        display: flex;
+        gap: 1em;
+        align-items: flex-start;
+        margin-bottom: 1.25em;
+      }
+
+      .step-list li::before {
+        content: counter(steps);
+        background: var(--witch-plum);
+        color: var(--witch-moon);
+        border-radius: 50%;
+        min-width: 1.8em;
+        height: 1.8em;
+        display: flex;
+        align-items: center;
+        justify-content: center;
+        font-size: 0.9rem;
+        flex-shrink: 0;
+      }
+
+      .is-dark .step-list li::before {
+        background: var(--witch-rose);
+        color: var(--witch-black);
+      }
+
+      .mod-block {
+        background: rgba(212, 165, 199, 0.08);
+        border: 1px solid var(--witch-plum);
+        border-radius: 10px;
+        padding: 1.25em 1.5em;
+        margin: 1.25em 0;
+      }
+
+      .is-dark .mod-block {
+        border-color: var(--witch-rose);
+      }
+
+      .mod-block h3 {
+        margin-top: 0;
+      }
+
+      .mod-block p:last-child {
+        margin-bottom: 0;
+      }
+
+      code {
+        background: rgba(0, 0, 0, 0.08);
+        border-radius: 4px;
+        padding: 0.15em 0.4em;
+        font-family: monospace;
+        font-size: 0.9em;
+      }
+
+      .is-dark code {
+        background: rgba(255, 255, 255, 0.08);
+      }
+
+      pre {
+        background: rgba(0, 0, 0, 0.08);
+        border-radius: 8px;
+        padding: 1em 1.25em;
+        overflow-x: auto;
+        font-family: monospace;
+        font-size: 0.9em;
+        line-height: 1.6;
+      }
+
+      .is-dark pre {
+        background: rgba(255, 255, 255, 0.05);
+      }
+
+      .back-link {
+        font-size: 0.9rem;
+        display: block;
+        margin-top: 2em;
+      }
+    </style>
+  </head>
+  <body>
+    <main>
+      <h1>Voice AI Agent Workshop</h1>
+      <div class="talk-meta">
+        <span>
+          <i class="fas fa-calendar-alt" aria-hidden="true"></i>
+          20&ndash;21 June 2026
+        </span>
+        <span>
+          <i class="fas fa-map-marker-alt" aria-hidden="true"></i>
+          UC Berkeley Campus, Berkeley, CA
+        </span>
+        <span>
+          <i class="fas fa-user-tag" aria-hidden="true"></i>
+          Speaker &amp; Sponsor (Berkeley AI Hackathon)
+        </span>
+      </div>
+      <div class="talk-links">
+        <a href="https://ai.hackberkeley.org" target="_blank" rel="noopener noreferrer">
+          <i class="fas fa-external-link-alt" aria-hidden="true"></i>
+          Berkeley AI Hackathon
+        </a>
+        <a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">
+          <i class="fas fa-external-link-alt" aria-hidden="true"></i>
+          Deepgram Console ($200 free credit)
+        </a>
+      </div>
+
+      <p>
+        Voice AI is having a moment &mdash; and it's more accessible than you might think. In this
+        workshop, we'll build a fully functional voice AI agent from scratch, using real-time
+        speech-to-text, a large language model for reasoning, and text-to-speech to talk back. By
+        the end, you'll have a working agent on your laptop that you can drop straight into a project.
+      </p>
+
+      <blockquote>
+        Talk to your hackathon project in 40 minutes.
+      </blockquote>
+
+      <p>No prior voice AI experience needed. If you've worked with an API before, you're good to go.</p>
+
+      <hr />
+
+      <section>
+        <h2>How a Voice Agent Works</h2>
+        <p>
+          A voice agent is three pieces wired together in a loop. Understanding this loop is the whole
+          conceptual foundation. Everything else is implementation detail.
+        </p>
+        <div class="loop-diagram">
+          <div class="loop-step">
+            <strong>Ear</strong><br />
+            Speech-to-Text
+          </div>
+          <span class="loop-arrow">&rarr;</span>
+          <div class="loop-step">
+            <strong>Brain</strong><br />
+            LLM
+          </div>
+          <span class="loop-arrow">&rarr;</span>
+          <div class="loop-step">
+            <strong>Mouth</strong><br />
+            Text-to-Speech
+          </div>
+          <span class="loop-arrow" style="transform: rotate(90deg); display: block;">&rarr;</span>
+        </div>
+        <p>
+          <strong>The ear</strong> captures your audio and transcribes it in real time using
+          speech-to-text. Latency here is everything: if your STT is slow, the whole agent feels
+          sluggish. Deepgram's STT runs at sub-300ms end-to-end, which is fast enough to feel like
+          a real conversation.
+        </p>
+        <p>
+          <strong>The brain</strong> receives the transcript and decides what to say. This is your
+          LLM: it reasons over the conversation history and your system prompt, generates a response,
+          and can call any functions you've given it access to. It can look things up, run code,
+          fetch data &mdash; anything you wire in.
+        </p>
+        <p>
+          <strong>The mouth</strong> takes the LLM's text response and streams it back as audio. Fast
+          streaming matters here too: you want audio to start playing before the full response is
+          generated, or the agent feels like it's thinking too hard.
+        </p>
+        <p>
+          With Deepgram, all three pieces run over a single WebSocket connection. You're not juggling
+          three separate APIs &mdash; it's one socket, one loop, about 80 lines of code to start.
+        </p>
+      </section>
+
+      <section>
+        <h2>Design Decisions That Actually Matter</h2>
+        <p>
+          Most tutorials skip these. They're the difference between an agent that feels like a demo
+          and one that feels like a tool.
+        </p>
+
+        <h3>Interruption Handling</h3>
+        <p>
+          In a real conversation, you don't wait for the other person to finish before you start
+          talking. Your agent shouldn't either. When the user starts speaking mid-response, the agent
+          needs to stop its output and listen. Getting this wrong is the fastest way to make an agent
+          feel robotic.
+        </p>
+        <p>
+          Deepgram's Voice Agent API handles this for you. The WebSocket connection detects voice
+          activity on both ends and manages the interrupt logic. You don't have to implement it
+          yourself &mdash; just don't override it.
+        </p>
+
+        <h3>Conversation State</h3>
+        <p>
+          Every turn in the conversation needs to be threaded correctly. The LLM needs the full
+          context of what's been said so far &mdash; by both sides &mdash; to give coherent responses. This
+          means maintaining a message history and passing it in with every LLM call.
+        </p>
+        <p>
+          Keep your context window in mind. Very long conversations will eventually push older turns
+          out of the context. A simple solution: keep the system prompt, the last N turns, and let
+          the rest roll off. The agent won't remember everything, but the conversation will stay
+          coherent.
+        </p>
+
+        <h3>Latency</h3>
+        <p>
+          The number that matters is end-to-end: from when the user stops speaking to when audio
+          starts playing back. Under 500ms feels conversational. Over 1 second feels like a phone
+          call with bad signal.
+        </p>
+        <p>
+          Three places latency hides: STT processing time, LLM first-token time, and TTS start time.
+          Streaming helps with all three. Start playing audio as soon as the first TTS chunk arrives.
+          Choose an LLM model that prioritises speed over capability for conversational use &mdash; a
+          smaller, faster model is often better here than a smarter, slower one.
+        </p>
+      </section>
+
+      <hr />
+
+      <section>
+        <h2>Getting Set Up</h2>
+
+        <div class="callout">
+          <p><strong>Before you start:</strong></p>
+          <ul>
+            <li>Node 20+ or Python 3.11+ installed</li>
+            <li>A code editor</li>
+            <li>A <a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">free Deepgram account</a> &mdash; includes $200 credit on sign-up, no credit card needed</li>
+          </ul>
+        </div>
+
+        <p>Five steps to a running agent:</p>
+        <ol class="step-list">
+          <li>
+            <div>
+              Sign up at <a href="https://console.deepgram.com" target="_blank" rel="noopener noreferrer">console.deepgram.com</a>.
+              Your account comes with $200 in free credit &mdash; more than enough for this workshop and a weekend of hacking.
+            </div>
+          </li>
+          <li>
+            <div>
+              Grab your API key from the console. It lives under API Keys in the left sidebar. Copy it somewhere safe.
+            </div>
+          </li>
+          <li>
+            <div>
+              Clone the starter repo. We'll walk through this together in the workshop, but the pattern is:
+              <pre>git clone &lt;starter-repo-url&gt;
+cd &lt;repo-name&gt;</pre>
+            </div>
+          </li>
+          <li>
+            <div>
+              Add your API key to the environment. Create a <code>.env</code> file in the project root:
+              <pre>DEEPGRAM_API_KEY=your_key_here</pre>
+            </div>
+          </li>
+          <li>
+            <div>
+              Install dependencies and run it:
+              <pre># Node
+npm install && npm start
+
+# Python
+uv venv && uv pip install -r requirements.txt
+python main.py</pre>
+            </div>
+          </li>
+        </ol>
+        <p>
+          If it's working, you'll see a connection message in the terminal and the agent will greet
+          you when you speak. If it's not working, come find us at the booth.
+        </p>
+      </section>
+
+      <hr />
+
+      <section>
+        <h2>Make It Yours &mdash; Three Modifications</h2>
+        <p>
+          The starter app works, but it's generic. These three modifications are where the session
+          becomes yours. Each one takes about five to seven minutes. Do them in order, or skip to
+          whichever interests you most.
+        </p>
+
+        <div class="mod-block">
+          <h3>Modification A: Change the Personality</h3>
+          <p>
+            The agent's system prompt is what defines who it is. Find the <code>system_prompt</code>
+            variable in the starter code &mdash; it'll look something like this:
+          </p>
+          <pre>system_prompt = "You are a helpful assistant."</pre>
+          <p>
+            Replace it with something more specific. Give the agent a name, a role, a point of view.
+            For example:
+          </p>
+          <pre>system_prompt = """You are Alf, a friendly AI assistant at a hackathon.
+You give encouragement, suggest project ideas, and answer questions
+about voice AI. You're enthusiastic but concise - people are busy building."""</pre>
+          <p>
+            Restart the agent and talk to it. The change is immediate. The system prompt is the
+            entire personality &mdash; there's no training, no fine-tuning. You just wrote it.
+          </p>
+        </div>
+
+        <div class="mod-block">
+          <h3>Modification B: Swap the Voice</h3>
+          <p>
+            Deepgram's TTS has a full voice catalogue. Find the voice configuration in the starter
+            code &mdash; it'll be a single string like <code>"aura-asteria-en"</code>. Change it to any
+            other voice from the catalogue.
+          </p>
+          <p>
+            A few to try:
+          </p>
+          <ul>
+            <li><code>aura-asteria-en</code> &mdash; warm, conversational</li>
+            <li><code>aura-orion-en</code> &mdash; deep, authoritative</li>
+            <li><code>aura-luna-en</code> &mdash; clear, neutral</li>
+            <li><code>aura-zeus-en</code> &mdash; bold, energetic</li>
+          </ul>
+          <p>
+            Restart and talk to your agent. It's a one-line change and the agent sounds entirely
+            different. This is the modification that gets the strongest reaction.
+          </p>
+        </div>
+
+        <div class="mod-block">
+          <h3>Modification C: Add a Custom Function</h3>
+          <p>
+            This is the one that makes you realise you can hook it to anything. Function calling lets
+            the agent invoke Python (or JS) functions you write, then incorporate the result into its
+            response naturally.
+          </p>
+          <p>
+            Here's a simple example: a function that returns a random project idea when the agent
+            is asked for inspiration.
+          </p>
+          <pre>import random
+
+def get_project_idea():
+    ideas = [
+        "A voice-controlled to-do list that reads back your tasks",
+        "An AI study buddy that quizzes you out loud",
+        "A real-time translator that speaks back in the target language",
+        "A voice journalling app that summarises your entries",
+    ]
+    return random.choice(ideas)</pre>
+          <p>
+            Register this function with the agent and tell the LLM when to use it via the system
+            prompt: <em>"When asked for a project idea, call get_project_idea() and share the result."</em>
+          </p>
+          <p>
+            Now ask your agent for a project idea. Watch it call the function and weave the result
+            into a natural spoken response. This is the "aha" moment: the agent can call your own
+            code, your own APIs, your own data. The voice interface is just the front door.
+          </p>
+        </div>
+      </section>
+
+      <hr />
+
+      <section>
+        <h2>What You Can Build From Here</h2>
+        <p>
+          Now that you have a working, personalised, function-capable voice agent, here's what that
+          unlocks for your hackathon project:
+        </p>
+        <ul>
+          <li>
+            <strong>Accessibility layer</strong> &mdash; add voice input and output to any existing
+            interface. Users who can't type or read small text get a completely different experience.
+          </li>
+          <li>
+            <strong>In-game NPC</strong> &mdash; drop the agent into a game as a character that actually
+            talks back. Hook the function calling to your game state so it knows what's happening.
+          </li>
+          <li>
+            <strong>Voice-controlled developer tool</strong> &mdash; talk to your build process, your
+            deploy pipeline, your monitoring dashboard. Voice is an unusually good interface for
+            things you want to do hands-free.
+          </li>
+          <li>
+            <strong>Multilingual support</strong> &mdash; Deepgram's STT handles dozens of languages.
+            The LLM can respond in whatever language the user speaks. Global voice interface, almost for free.
+          </li>
+        </ul>
+      </section>
+
+      <section>
+        <h2>Get Help</h2>
+        <p>Building something? Stuck on something? Here's where to find us:</p>
+        <ul>
+          <li><strong>At the event:</strong> the Deepgram booth &mdash; we're here all weekend</li>
+          <li><strong>Docs:</strong> <a href="https://developers.deepgram.com" target="_blank" rel="noopener noreferrer">developers.deepgram.com</a></li>
+          <li><strong>Community:</strong> <a href="https://deepgram.com/discord" target="_blank" rel="noopener noreferrer">Deepgram's Discord</a></li>
+        </ul>
+        <p>
+          The Deepgram challenge prize this weekend goes to the team that builds the most creative
+          voice-powered experience. Come say hi, show us what you're building, and let us know if you
+          want feedback on your voice integration before judging.
+        </p>
+      </section>
+
+      <hr />
+      <a class="back-link" href="/">
+        <i class="fas fa-arrow-left" aria-hidden="true"></i>
+        Back to all talks
+      </a>
+    </main>
+  </body>
+</html>