My house is definitely smart to the extreme. I have a lot of automation that runs all aspects of the house, security with AI, and dedicated servers for automation and AI. One example of how I have already wired some of these things together, is a Python service using Ollama & Llava as a false positive checker for Frigate. That being said, all of this private goodness is still connected to Amazon’s Alexa ecosystem for voice. We use it constantly, it is setup as a gateway to Home Assistant, so that’s the only skill configured. It’s basically a front door to Home Assistant.
The thing is, I want it to do more. I want it to be smarter. I want it to know about more. That mean’s it’s time to make my own personal Jarvis. I’m not going to be that guy and call it Jarvis. I am going with Nova so far.
The Idea
The core idea is that “Hey Nova, start a new project,” would wake the edge device and send “Start a new project” as a command to the Nova service. The Nova service has command based flow control that knows start a new project is an automation that creates a new folder in /Projects/ and adds a new markdown note.
This is where it can be interactive, Nova can ask “What should I say this project is for?” and capture my response in the new note. Then I can have a whole array of “add a new note, create a task, capture a photo of, etc.” that extend the project flow.
I expect to have similar flows for getting content from notes, searching email, responding to texts, interacting with Home Assistant or summarizing my calendar.
Architecture
Edge: I want to interact with a device like an Echo, but home brew. I am not trying to re-invent smart displays or anything. My prototype version is a Raspberry Pi 4/4GB/Raspbian, with a USB conferencing speaker/mic combo I found on Amazon. The speaker sounds surprisingly good and the mic has worked fine for testing on my desk so far. So what does it do?

- Python App listens for wake phrase “Hey Nova”
- Capture phrase after and send it to Nova service
- Playback response from Nova service
Nova Service: Right now this is a pretty simple Python service. It’s setup to deploy in Docker, via Docker Compose, so it’s simple to stand up. The service runs on my Gen AI Ubuntu server (64GB i7), which has a nVidia RTX 2080Ti & 4070. The 4070 handles the Ollama load and the 2080Ti should be more than enough for a high quality voice. So what will the service do?

- Listen for edge devices
- Pass the captured phrase to Flowise or n8n for logic
- Embed and Search notes in Qdrant Vector database
- Generate voice for response to prompts
- Gateway to Home Assistant, Frigate, Ollama, ChatGPT, Email, Notes, SearX web search, Surveillance Station
The Gen AI server isn’t doing the heavy lifting alone. The automation server is a Ryzen 16 core with 64GB of ram and a RTX 3060. The 3060 is pretty used up for mpeg decoding the security video, but the CPU is nearly idle and should have more than enough bandwidth for executing the automations. With Home Assistant connected to nearly everything it can be and Node Red handy, it should be the perfect execution engine with it’s own API ready to go.
What Works Now?
Just two core flows, but without a private TTS service. I can call Nova, ask a question, send it to Ollama and get back a voice response. I can also ask Nova to change the state of a handful of Home Assistant entities. The next step is to build a Coqui TTS service on the Gen AI server so the end to end flow is private.
Once I have that server up and running and the whole chain end to end private, I’ll integrate in Flowise. I like the idea of being able to visually tweak and see these flows once the conversation chain gets more complicated. It may even be multiple Flowise agents that talk with each other, I’ll have to get into that more soon.