Case study · 2025
AreYouAI
A Chrome extension that flags AI-generated videos on YouTube as they play — frame by frame. Built so my parents wouldn’t have to learn what a deepfake is to feel safe online.
1st
place at the RUAI hackathon
5s
frame extraction cadence during playback
1
GPU signed by Jensen Huang
The trigger
The hardest part of building for my parents isn’t the interface. It’s the literacy gap. They grew up trusting what they saw on a screen — and now the screen is generative.
My parents shouldn’t have to learn the word “deepfake” to feel safe online.
I wanted a tool that did the learning for them. Quiet, always-on, no permission to grant. Sits in the browser, watches what you watch, tells you when something feels off.
The shape
A Chrome extension that injects a single button below any YouTube player: Analyze video. Click it, and the extension starts pulling a frame every five seconds straight from the <video> element using the HTML5 Canvas API.
Each frame goes to a small Python/FastAPI backend, which forwards it to NVIDIA’s NIM endpoint running nemotron-nano-12b-v2-vl — a vision-language model tuned for visual reasoning. The model returns a confidence score, the artifacts it caught, and a short reasoning string.
The verdict surfaces in the UI in real time: a confidence percentage, a yes-or-no, a list of inconsistencies — facial artifacts, lighting breaks, edge errors, temporal jitter, synthetic textures — and a short, plain-language reason.
Design decisions
Three calls shaped the experience.
Frame-by-frame, not video-by-video. A video is a sequence of decisions. Some frames are pristine; some leak. Analyzing per frame gave the user a timeline of trust, not a single yes/no.
The button lives where attention already is. Below the player, in the same row as Like and Share. No popup, no overlay. The decision moment is when the user is already looking at the video — the interface meets them there.
The verdict carries a reason. A confidence score with no explanation is harder to trust than the thing it’s evaluating. We always surface what the model saw: which features looked synthetic, in language a non-technical viewer can hold.
The architecture
The Chrome extension is V3 — service worker for state, content script for the YouTube injection, popup UI for configuration. The backend is a thin FastAPI process: one analyze-frame endpoint, one analyze-batch endpoint, a health check.
The NIM model does the heavy lifting. The backend exists mostly to keep the API key off the client and to normalize image sizes before they hit the model. Everything else is glue.
The hackathon
Built it over a weekend at RUAI. Won first place. The prize was a GPU signed by Jensen Huang.
The award mattered less than the room. Most of the judges asked the same first question: can you ship this to my mom?
Reflection
The instinct is to treat AI literacy as something users need to acquire. But literacy is a tax — and like every tax on attention, it falls hardest on the people we built the internet to include.
The job of the interface, in this era, is to absorb that tax. A parent shouldn’t need to learn what a deepfake is, any more than they should need to know how SSL works to use a bank app. The literacy moves into the tool.
What’s next
The hackathon version analyzes on click. The shipped version should analyze in the background, surface a quiet badge when something is off, and let the user ask why. The shift from analysis to ambient.