Jun 18, 2026 Productivity

Productivity Hacking With AI: How I Let a Model Reorder My To‑Do List

I stopped manually grooming my to-do list and let a model reorder it using my own data. Here is how that actually works day to day, and what broke.

Photo by Carl Heyerdahl / Unsplash

Why I Stopped Trusting My Gut About Priorities

I have a bad habit. I overestimate future me. Future Richard is always rested, focused, and totally fine with five deep work tasks in one afternoon.

Reality Richard likes coffee, context switching, and occasionally watching baseball highlights between commits.

So my to-do list started lying to me. Every morning I would drag tasks around, star some, snooze others. By lunch, the plan was already dead. By Friday, half the “high priority” items had quietly rolled all week like a sad Kanban zombie.

I wanted something stricter than my mood and more honest than my ambition. That usually means data. And lately, it also means an AI model on top of that data.

This is the story of how I stopped manually prioritizing tasks and let a small model reshuffle my list for me. No corporate SaaS, no magic productivity system. Just my own logs and a simple ranking pipeline.

The Real Question: What Do I Actually Finish?

Most people track tasks as if every checkbox is equal. They are not. Some tasks have a high chance of getting done once they hit the list. Others are basically fiction.

I wanted to answer a blunt question.

Given a task like this, at this time of day, on this kind of day, what is the actual probability that I will finish it?

That is all my system does. It predicts follow-through, not importance. That sounds subtle, but it changes everything.

Importance is theoretical. Follow-through is observable. My calendar either shows the work or it does not. My git history either has a commit or it does not.

The Data I Actually Collect (Nothing Fancy)

This runs on boring data. Nothing inspirational. Just what I already had scattered around.

Tasks in my main list app (Todoist for now, but the app does not really matter).
Timestamps for when I create, schedule, complete, or reschedule tasks.
Calendar events, mostly work blocks, calls, and baseball coaching.
Very rough energy markers from my biohacking experiments. Sleep score, HRV, and a simple 1–3 “felt like a zombie / normal / sharp” field in a daily note.

I ship all of that into a small database with a cron script. A simple PostgreSQL instance on a tiny server. No big data pipeline. No Kafka shrine in the living room.

Every night a job runs and builds a training set. The dataset is not large. It does not need to be. I am modeling one person: me.

How I Turn Tasks Into Features a Model Can Understand

A task like “Refactor auth flow” is not useful raw. The model needs structured inputs. So I turn each task into a feature vector that reflects my behavior.

Here is what I extract for each historical task:

Text features from the title and description. I embed the text to capture semantics. Coding task, writing task, admin, etc.
Project and labels. Client work, personal dev, content, finance, baseball. I treat these as categories.
Expected depth. Rough estimate. Is this deep work or shallow maintenance? I approximate it from historical tasks with similar text and duration.
Creation time. Hour of day, day of week.
Scheduled time. When I planned to do it.
Actual completion time, if done, and the time difference from scheduled time.
Reschedule count. How many times I kicked this down the road.
Energy state at the time. Sleep score, HRV bucket, my own 1–3 rating.

For each task, I compute a target label:

Completed within 24 hours of target day: 1
Not completed in that window: 0

That gives me a binary classification problem. Given a new task with a time slot and context, predict the chance that I will actually finish it in that window.

It is not pretty. But it is honest.

The Model: Boring On Purpose

Everyone loves to throw a giant language model at everything. I tried that first. It felt clever. It was also overkill and annoyingly slow for a morning routine.

What I run now is intentionally boring:

A simple gradient boosted tree model (XGBoost style) on top of numeric features.
Text goes through a small embedding model once, then I store the vectors.

The nightly training job pulls the last 6–9 months of tasks, rebuilds the feature table, and retrains from scratch. Fresh each night. No complex online learning.

The only role for an LLM is upstream. It does some light interpretation when a new task is created, for example:

Guessing whether a task is deep work, shallow work, or mechanical.
Suggesting an initial estimate of duration.
Sometimes rewriting my 3-word note into something slightly more actionable.

That LLM layer is optional. I kept it modular so I can swap models or turn it off. The prioritization model does not care. It just wants numbers.

Morning Workflow: How The List Reorders Itself

Here is what actually happens when I start my day in front of the laptop.

I have a simple script that runs with a single hotkey. It does the following.

Pull all tasks for the next 3 days from my task manager.
For each task, assign a candidate time window based on existing due dates and my calendar.
Fetch my current and forecasted energy state. Sleep, recovery, planned training.
Build the feature vector for each task + time window pair.
Ask the model: what is the probability that Richard finishes this task in that window?
Sort by that probability, with a few constraints.

Those constraints matter. This is the part that kept breaking:

Never schedule more than one deep work task per block.
Cap total deep work blocks per day, based on my recovery score.
Always keep one “easy win” near the top to build momentum.
Respect hard calendar deadlines, even if my behavior says I will procrastinate.

The output is a ranked list that gets pushed back into my task app as a reordered priority for the day, plus suggested time blocks in the calendar.

On a good day that takes less than 10 seconds. Then my list stops changing. No mid-day re-optimizing. No new model calls. Just execution.

Where The Machine Was Wrong (And I Was Worse)

The first month was messy. The model exposed two things very quickly.

First, I am terrible at estimating writing tasks. Anything that said “Write X” got way too optimistic a slot. The model, trained on my failures, started ranking those lower in the day, especially if there was already a code-heavy morning.

Second, late afternoon deep work is mostly a fantasy for me. If a task landed after 16:00 that required actual thinking, odds of completion dropped hard.

The model learned that quickly. I kept fighting it. I would drag a “Write long-form post” task back into the 17:00 slot. Surprise. I did not do it. The task rolled. The data agreed with the algorithm, not with my self image.

That is the real benefit here. The system does not care who I think I am. It only cares what I do.

Using AI As a Mirror, Not a Manager

I do not want an AI to act like a boss. I want it to act like a brutally honest mirror with a calculator.

So instead of just blindly accepting the sorted list, I surface the predictions.

When I hover a task, I see a small line like:

"Historically, tasks like this at 10:00 on Tuesday have a 78 percent completion rate for you."

If I drag that task to 16:30, that might drop to 31 percent. The system updates the prediction instantly. It does not prevent me from doing it. It just shows the odds, based on my own history.

This helps in two ways.

I feel less guilty when a task keeps slipping, because I can see the pattern. It is not just willpower, it is context.
I can intentionally swim against the data when I want to push a change, and then watch the numbers shift over time.

Productivity advice usually talks down to you. This feels more like a detailed logbook of my habits with a simple UI.

What Improved, In Actual Numbers

Opinion aside, I care about numbers. So I measured a few simple metrics before and after switching to this AI layer.

Daily completion rate for planned tasks went from roughly 55 percent to around 72 percent after 6 weeks.
Average reschedule count per task dropped from 2.4 to 1.1.
Deep work blocks actually used as planned increased by about 30 percent, based on calendar vs git / writing logs.

None of that is scientific. There was no control group. I did not randomize. This is just “old me vs current me with a model in the loop.”

Still, the difference feels strong enough that I am not going back to manual grooming.

What This Looks Like Under The Hood

I know developers read this and immediately think: where does this run, how brittle is it, and how much yak shaving is involved?

The answer: less than you think, more than a normal person would tolerate.

A small server running Postgres and a simple API in Node.
Cron jobs for nightly data sync and model retraining.
Python script for model training, using standard ML libraries.
A local CLI to trigger reprioritization, wired to a keyboard shortcut.

I experimented with serverless, but cold starts plus external APIs made it feel sluggish. I prefer a small permanent box with everything warm and ready.

The real work is not infra. It is cleaning the data. My task history was chaotic. Old labels, duplicate tasks, vague titles like “check that thing.”

The first week was basically archaeology. Fixing the past so future predictions had something solid to build on.

Stuff I Tried That Did Not Work

Some ideas sounded clever and turned out useless.

Sentiment analysis on task text. The model did not need to know if a task “sounded” negative. My behavior already encoded that.
Too many labels. I tried micro labels for every domain of my life. It added noise and maintenance overhead.
Hard enforcement. I tried locking the list so I could not move tasks. That backfired. I rebelled against my own system.

The pattern is simple. Anything that increased friction or required “perfect usage” died fast. Anything that quietly observed and adapted has survived.

Why This Feels Different From Normal Productivity Systems

Most productivity systems start with an ideal human and build from there. Time blocking assumes you can hold a rigid schedule. GTD assumes you will empty inboxes religiously.

I do not start there. I start from my own failure logs. The model watches what I actually do after I plan, then adjusts future plans around my patterns.

That feels healthier. Less guilt, more realism.

I am not trying to become a productivity robot. I am trying to route the limited high energy windows to the tasks that matter most, and then accept that the rest will always be a bit messy.

Where I Want To Take This Next

This system still feels like a prototype. Useful, but rough around the edges.

The next three experiments I want to run:

Multi objective ranking. Blend follow-through probability with actual business impact, not just “do I finish it.”
Better energy modeling. Use more detailed biometric data from my biohacking stack, but keep the interface simple.
Micro feedback loops. After each block, a 5 second “Did this block go as planned?” input back into the model.

I like the idea of a personal model that ages with me. My life changes, my training changes, my work changes. I want the system to track those shifts without me designing a brand new productivity framework every year.

If You Want To Build Your Own

I will not pretend this is plug and play. It is opinionated, and it fits my life, not yours. But the core idea is portable.

Stop ranking tasks by how important they feel in the moment.
Start measuring which tasks you actually finish under which conditions.
Train a small model to predict that, then let it reorder your list with constraints that match your reality.

Use AI as a mirror. A data heavy, mildly annoying mirror that keeps telling you, with receipts, who you really are at 16:30 on a Wednesday.

I find that strangely calming. The list is no longer a wish. It is a forecast. And most days, it is right.

Subscribe to my newsletter

Subscribe to my newsletter to get the latest updates and news

Productivity Hacking With AI: How I Let a Model Reorder My To‑Do List

by Richard Lemon

Why I Stopped Trusting My Gut About Priorities

The Real Question: What Do I Actually Finish?

The Data I Actually Collect (Nothing Fancy)

How I Turn Tasks Into Features a Model Can Understand

The Model: Boring On Purpose

Morning Workflow: How The List Reorders Itself

Where The Machine Was Wrong (And I Was Worse)

Using AI As a Mirror, Not a Manager

What Improved, In Actual Numbers

What This Looks Like Under The Hood

Stuff I Tried That Did Not Work

Why This Feels Different From Normal Productivity Systems

Where I Want To Take This Next

If You Want To Build Your Own

Member discussion

Letting AI refactor one legacy stylesheet →

The Make automation that was slower than manual →

A tiny script that adds my internal links →

Why I Stopped Trusting My Gut About Priorities

The Real Question: What Do I Actually Finish?

The Data I Actually Collect (Nothing Fancy)

How I Turn Tasks Into Features a Model Can Understand

The Model: Boring On Purpose

Morning Workflow: How The List Reorders Itself

Where The Machine Was Wrong (And I Was Worse)

Using AI As a Mirror, Not a Manager

What Improved, In Actual Numbers

What This Looks Like Under The Hood

Stuff I Tried That Did Not Work

Why This Feels Different From Normal Productivity Systems

Where I Want To Take This Next

If You Want To Build Your Own

Similar topics

Letting AI refactor one legacy stylesheet →

The Make automation that was slower than manual →

A tiny script that adds my internal links →