This week, I decided it was high time to unleash a digital doppelgänger that could respond like me. I mean, who wouldn’t want a large language model (LLM) that can mimic your witty and sometimes questionably tasteful email style?
Usually, my experiments are done in an afternoon, but for this one I set aside a whole week—because training a custom AI to sound like me clearly needs more than a day’s worth of tinkering.
Data Gathering and Preprocessing
The first step was rummaging through my sent email archive, an endeavor that felt a bit like spring cleaning for my Gmail. After exporting all those messages, I wrote a Python script to systematically strip out signatures, junk text, and the occasional emoticon spree. Nothing too flashy, just your standard data cleaning.
Trust me, the time you spend cleaning up the data is well worth it when your model doesn’t spontaneously talk about some old Amazon tracking link in the middle of discussing weekend plans.
Model Selection and Fine-Tuning
Once I had my text in decent shape, I borrowed a pre-trained Transformer model. The idea is simple: take a model already trained on massive amounts of text, then fine-tune it on my email data so it sounds like me instead of your average GPT.
Results and Hallucinations
Here’s where it got interesting, and by “interesting,” I mean “amusing and vaguely concerning.” The model actually did start writing in a voice that was suspiciously close to my own. Subject lines, quick one-liners, and even my overuse of lol was spot-on. However, local fine-tuned models can still be a bit small, so every now and then it would generate responses that seemed to come from an alternate dimension. For instance, it once wrote about chinchilla grooming tips, even though I know I have never emailed someone about that.
Evaluation and Sanity Checks
In practice, I’d write a tiny script that sends the model a prompt, like an email chain about scheduling a meeting, and see how it responds. If it suggested meeting at a time that defied the laws of physics or started quoting obscure movie lines, I’d dial back the training parameters or do some additional data cleaning. Let’s just say, “always read your outputs before hitting send.” Words to live by.
Conclusion: “Yes, kinda, but I probably shouldn’t.”
So the short answer to “Should I trust an AI that writes like me?” is basically “probably not.” It’s a fun experiment that illustrates how personal text data and modern AI can blend together to produce spookily similar writing.
But it’s also a cautionary tale: local LLMs have come a long way but can still spin some truly bizarre tales. For now, I’ll keep using it as a parlor trick rather than a serious email-sending companion.