AI Engineer - Videos
Back to ChannelHow to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe
Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to sol...
OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs
Peel back the curtain on state of the art model post-training through the story of OpenThinker, a SOTA small reasoning model (outperforming DeepSeek distill), built in the open. Learn about the dat...
Google Photos Magic Editor: GenAI Under the Hood of a Billion-User App - Kelvin Ma, Google Photos
Go behind the scenes of Google Photos' Magic Editor. Explore the engineering feats required to integrate complex CV and cutting-edge generative AI models into a seamless mobile experience. We'll di...
Dream Machine: Scaling to 1m users in 4 days — Keegan McCallum, Luma AI
Talking about Luma AI, our mission, and how our ML infrastructure enables SOTA multimodal model development About Keegan McCallum I'm Keegan McCallum, the Head of ML infrastructure at Luma AI. I ...
ComfyUI Full Workshop — first workshop from ComfyAnonymous himself!
Quick introduction to ComfyUI and what's new followed by a QA session. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our ...
Design like Karpathy is watching — Zeke Sikelianos, Replicate
Legendary AI engineer and educator Andrej Karpathy recently blogged about his experiences building, deploying, and monetizing a vibe-coded web app called MenuGen. Let's dig into the challenges he f...
On Curiosity — Sharif Shameem, Lexica
Creating and sharing demos is the easiest way to influence the future. It gets people to think about what's possible. A good tech demo doesn't have to be fully fleshed out. It doesn't even have to ...
Real world MCPs in GitHub Copilot Agent Mode — Jon Peck, Microsoft
As developers, we don't spend most of our time vibe-coding prototypes. More often, we're adding features, squashing bugs, and building tests for existing apps across a wide variety of services and ...
The rise of the agentic economy on the shoulders of MCP — Jan Curn, Apify
Thanks to MCP and all the MCP server directories, agents can now autonomously discover new tools and other agents. This lays down the foundation for the future agentic economy, where businesses wil...
MCP is all you need — Samuel Colvin, Pydantic
Everyone is talking about agents, and right after that, they’re talking about agent-to-agent communications. Not surprisingly, various nascent, competing protocols are popping up to handle it. But...
Full Spec MCP: Hidden Capabilities of the MCP spec — Harald Kirschner, Microsoft/VSCode
The true power of Model Context Protocol emerges when clients and servers collaborate across the full spectrum of the specification. This talk presents practical examples of how VS Code's comprehen...
Shipping an Enterprise Voice AI Agent in 100 Days - Peter Bar, Intercom Fin
What does it take to go from blank page to live enterprise voice agent in 100 days? That’s the challenge we took on with Fin Voice at Intercom. Enterprise customer service demands high-quality, re...
The State of Generative Media - Gorkem Yurtseven, FAL
Generative AI is reshaping the creative landscape, enabling the production of images, audio, and video with unprecedented speed and sophistication. This session offers an in-depth exploration of th...
Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+DAU - Devansh Tandon
YouTube recommendations drive the majority of video watch time for billions of daily users. Traditionally powered by large embedding models (LEMs), we're undertaking a fundamental shift: rebuilding...
Transforming search and discovery using LLMs — Tejaswi & Vinesh, Instacart
Learn how Instacart uses cutting-edge LLMs to redefine search and product discovery. - Explore innovative solutions overcoming traditional search engine limitations for grocery shopping. - Discove...
Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix
Discuss the foundation model strategy for personalization at Netflix based on this post https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39 and recent developm...
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
We will give a talk about our journey of building a foundation model for solving ranking and recommendation tasks About Hamed Firooz Principal AI Scientist at LinkedIn Core AI. With 15 years in la...
What We Learned from Using LLMs in Pinterest — Mukuntha Narayanan, Han Wang, Pinterest
Pinterest Search integrates Large Language Models (LLMs) to enhance relevance scoring by combining search queries with rich multimodal content, including visual captions, link-based text, and user ...
ARC AGI-3: Interactive Reasoning Benchmarks for Measuring AGI — Greg Kamradt, ARC Prize Foundation
ARC Prize Foundation is building the North Star for AGI—rigorous, open benchmarks that track reasoning progress in modern AI. We'll show why static AGI evaluations are useful, but fall short when c...
RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai
The models and techniques to build fully autonomous coding agents - not just coding copilots - are already here. In this talk, former Google DeepMind staff research scientist, now CEO of Reflection...
Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon
Recommendation systems and search have long adopted advances in language modeling, from early adoption of Word2vec for embedding-based retrieval to the transformative impact of GRUs, Transformers, ...
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy, Every.to
Benchmarks shape more than just AI models—they shape our future. The things we choose to measure become self-fulfilling prophecies, guiding AI toward specific abilities and, ultimately, defining hu...
Small AI Teams with Huge Impact — Vik Paruchuri, Datalab
We scaled Datalab 5x this year - to 7-figure ARR, with customers that include tier 1 AI labs. We train custom models for document intelligence (OCR, layout), with popular repos surya and marker. I...
Rethinking Team Building: how a 30-person Startup serves 50 Million Users — Grant Lee, Gamma
The central thesis of this talk is that in the rapidly evolving age of AI, startups and tech companies should reject the traditional "blitzscaling" model of hyper-growth and specialized roles. Inst...
Building a 10 person unicorn - Max Brodeur-Urbas, Gumloop
An overview of how Gumloop is scaling automation across companies like Instacart, Webflow and Shopify with less than 10 people. About Max Brodeur-Urbas ex-microsoft engineer, started Gumloop in my...
Using OSS models to build AI apps with millions of users — Hassan El Mghari
In this talk, Hassan will go over how he builds open source AI apps that get millions of users like roomGPT.io 2.9 million users, restorePhotos.io 1.1 million users, Blinkshot.io 1 million visitors...
Bolt.new: How we scaled $0-20m ARR in 60 days, with 15 people — Eric Simons, Bolt
Tiny Teams are the future of how startups are built, and it all comes down to team culture, decision making, tooling choices, and endless grit. In this talk, Eric will share the high octane insigh...
Prompt Engineering and AI Red Teaming — Sander Schulhoff, HackAPrompt/LearnPrompting
Learn from the creator of Learn Prompting, the internet's 1st Prompt Engineering guide (released 2 months before ChatGPT), and HackAPrompt, the World's 1st AI Red Teaming competition. My talk will...
Survive the AI Knife Fight: Building Products That Win — Brian Balfour, Reforge
If you’ve ever been blocked by vague specs, shifting goals, or chasing “vibes,” things have only gotten messier in the age of AI. Everyone is obsessing over engineers doing PM work and PMs cranking...
Automating Escrow with USDC and AI - Corey Cooper, Circle
This workshop explores how USDC, AI, and smart contracts can streamline escrow by automating fund release based on task or process verification. By using AI to interpret off-chain signals such as d...
How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS - Ishan Anand
Don't be intimidated. Modern AI can feel like magic, but underneath the hood are principles that web developers can understand, even if you don't have a machine learning background. In this worksho...
[Workshop] AI Pipelines and Agents in Pure TypeScript with Mastra.ai — Nick Nisi, Zack Proser
This hands-on workshop introduces Mastra.ai, a TypeScript framework that streamlines the development of agentic AI systems compared to traditional approaches using LangChain and vector databases. P...
AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind
Hands on Workshop on learning to use Gemini 2.5 Pro in combination with Agentic tooling and MCP Servers. About Philipp Schmid Philipp Schmid is a Senior AI Developer Relations Engineer at Google...
The New Code — Sean Grove, OpenAI
In an era where AI transforms software development, the most valuable skill isn't writing code - it's communicating intent with precision. This talk reveals how specifications, not prompts or code,...
Production software keeps breaking and it will only get worse — Anish Agarwal, Traversal.ai
Software is eating the world. AI is eating software. AI-powered SWE means a whole lot more software is going to be written that powers mission critical systems in the coming years, with hardly any ...
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
Progress towards general intelligence has been marked by identifying fundamental intelligence bottlenecks within existing models and developing solutions that improve the architecture or training o...
A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind
Over the last year, Google and Gemini models have shown rapid progress across all dimensions (model, product, etc). Let's highlight all the work that has happened, how we got the worlds best models...
The Wild World of AI: 6 Months That Changed Everything
From pelicans on bicycles to $600 billion market crashes - discover the most insane AI developments of the past 6 months! 🤖🚲 #AI #MachineLearning #LLM #TechNews #AIRevolution #OpenAI #DeepSeek #Te...
2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
What's changed in the world of LLMs since the AIE World's Fair last year? A lot! I'll be taking full advantage of my role as a fiercely independent researcher to review the past 12 months of advan...
Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai
The entire AI stack is developing faster than ever - from chips to infrastructure to models. How do you sort the signal from the noise? Artificial Analysis an independent benchmarking and insights ...
Training Agentic Reasoners — Will Brown, Prime Intellect
This talk will be a technical deep dive into RL for agentic reasoning via multi-turn tool calling, similar to OpenAI's o3 and Deep Research. In particular, we'll cover: - When, why, and how - GRPO...
New York Times' Connections: A Case Study on NLP in Word Games — Shafik Quoraishee, NYT Games
This session will examine the interplay between human intuition and artificial intelligence in puzzle-solving, using the popular New York Times Connections game as a practical case study. ...
Claude Code & the evolution of agentic coding — Boris Cherny, Anthropic
A ten thousand foot view of the coding space, the UX of coding, and the Claude Code team's approach. About Boris Chemy Created Claude Code. Member of Technical Staff @Anthropic. Prev: Principal En...
12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer
Hi, I'm Dex. I've been hacking on AI agents for a while. I've tried every agent framework out there, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to t...
MCP Is Not Good Yet — David Cramer, Sentry
You’ve heard a lot about MCP, probably been given an AI mandate or two, and are trying to figure out what’s real and what’s make believe. This session will give practical advice for how you shoul...
Your Personal Open-Source Humanoid Robot for $8,999 — JX Mo, K-Scale Labs
Introducing developer ready robots that are open-source, affordable, and easy to use. https://www.kscale.dev/ About Jingxiang Mo Jingxiang Mo is a founding engineer at K-Scale Labs, where he lead...
The Build-Operate Divide: Bridging Product Vision and AI Operational Reality
Product leaders see AI possibilities. Operations teams see implementation chaos. That disconnect can kill promising AI features before they ever reach users. In this session, Chris Hernandez (Chim...
The New Lean Startup — Sid Bendre, Oleve
In this session, I will be presenting a case study of Oleve's journey, revealing how we've scaled a profitable multi-product portfolio with a tiny team. I'll walk you through the emergence of "tiny...
Conquering Agent Chaos — Rick Blalock, Agentuity
Agent deployments can be dicey, especially at first. This session goes over all the things that cause headache with deployments from serverless issues to networking issues - and how we fix them. ...
Optimizing inference for voice models in production - Philip Kiely, Baseten
How do you get time to first byte (TTFB) below 150 milliseconds for voice models -- and scale it in production? As it turns out, open-source TTS models like Orpheus have an LLM backbone that lets u...