Frigate is great at detecting objects. It will tell you a person showed up at the front porch, or that a car pulled into the driveway. What it won’t tell you is whether that person is your neighbor waving hello or someone trying your door handle. For that, you need a brain behind the camera feed, not just a classifier. This workflow wires one up on my local n8n instance so everything stays private and runs fast.

The Pipeline
Everything starts with an MQTT message. When Frigate finalizes a review, it publishes a payload to frigate/reviews. The payload includes the camera name, a list of detected object types, and an array of individual detection event IDs captured during the review window.
An n8n MQTT trigger catches that payload and passes it to the first Code node, Filter & Build. This node does two things: it drops anything that isn’t a finalized review, and it applies a per-camera allow-list of object types. The driveway camera cares about persons and cars. The front porch cares about packages and persons. Any review that doesn’t match its camera’s list gets dropped here, before anything downstream runs.
If the review passes the filter, the next node is Split Detections. It fans out one item per detection event ID, up to five. This is pure JavaScript. Because n8n executes downstream nodes once per output item, this single map() call is what turns one review into up to five parallel fetch operations.
Fetch Snapshot hits the Frigate API once per event ID and returns the JPEG as binary data. Then Build Vision Body runs and the workflow re-aggregates. It collects every fetched frame, loops through them to pull the raw bytes, and base64-encodes each one. Frames that fail silently (network timeout, missing snapshot) are skipped rather than aborting the run.
Once all usable frames are collected, the node assembles a structured prompt and packages everything into a single request body for Ollama.

The model is qwen3-vl:4b running locally on my Nova server, a dual NVIDIA GPU Ollama instance. The call goes to /api/generate and the response comes back as plain text. Prepare Row parses it with two regex calls: one for threat level, one for recommendation, and converts the Frigate timestamps to ISO format. It then builds the database record. One detail worth noting: Prepare Row doesn’t receive its metadata, it back-references the earlier node directly. This is an n8n cross-node reference, and it’s necessary here because the Ollama response only carries the model output, not the review metadata that needs to go into the DB alongside it. The record goes into a frigate_vision table in TimescaleDB via the Postgres node.

Why This Approach Works
Running the vision model locally means no API costs, no images leaving the network, and no latency waiting on a cloud provider. The multi-frame analysis is the real game-changer. A single snapshot often lacks the context to determine whether someone is passing through or lingering. Feeding Qwen two to five sequential frames and explicitly asking it to describe behavioral change across them produces meaningfully better threat assessments than any single-image prompt.
The structured prompt took several iterations. Early versions returned freeform paragraphs that were interesting but difficult to parse reliably. Locking the model into labeled sections with enum-style values for threat level turned the output into something closer to structured data than prose, which is what an automated pipeline actually needs. The recommendation format makes the output readable without any further transformation and easy to summarize consistently.
The whole workflow runs end-to-end in roughly 15 seconds on a single 4070 GPU. For a home security use case, that’s well within the window where the information is still actionable.
Where n8n Wasn’t Quite So Easy
Four of the eight nodes in this workflow are raw JavaScript Code nodes. That’s not because the problem is hard, it’s because n8n has no native primitives for scatter-gather, binary aggregation, local LLM integration, or structured output parsing.
The best example is Build Vision Body. The fan-out from Split Detections produces multiple items, and Build Vision Body needs to wait for all of them before it can assemble the multi-frame request. n8n batches this automatically, but getting the binary data out of each one requires looping through them.
The cross-node back-reference in Prepare Row is a similar pattern. Because the Ollama response only carries resp.response, the metadata has to come from somewhere else. n8n lets you reference earlier nodes with $(‘Node Name’).item.json, but only if the execution path is compatible..
n8n is still a great choice for this kind of self-hosted AI stack. Nothing here was impossible, and the MQTT trigger, HTTP request nodes, and Postgres insert all worked exactly as expected. But there’s a real gap between “possible” and “approachable.” This workflow required either a developer or quality code generation for four out of eight steps. That’s worth knowing before you start.
