What 10,000 Mental Math Sessions Taught Us About Cognitive Performance

The Patterns Nobody Told Us About

When we built MentalMather, we had hypotheses about how people would use it and what their data would show. Some hypotheses were confirmed. Others were wrong in interesting ways. And a few patterns emerged that we didn't expect at all. This article describes what we've learned from observing the aggregate patterns in cognitive performance data — not individual user data (which stays on each user's device), but the behavioral patterns and architectural insights that emerge from building and testing a daily cognitive measurement tool.

Division Is the Canary

Of the four arithmetic operations, division is the most sensitive to cognitive state. Users' addition and multiplication speeds tend to be relatively stable across days — these operations are primarily fact retrieval, and fact retrieval is robust to moderate variations in sleep, stress, and mood. But division speed — particularly for problems that require multi-step mental procedures rather than direct retrieval — fluctuates more widely and responds more dramatically to lifestyle variables.

This makes sense through the lens of working memory theory. Division problems like 156 ÷ 12 require holding the dividend, estimating a quotient, computing a partial product, comparing it to the original, and adjusting — a sequence of operations that heavily loads the central executive component of working memory. When working memory capacity is reduced by poor sleep, stress, or illness, division is the first operation to show it. The per-operation baseline was designed with exactly this sensitivity in mind — a blended score would dilute the division signal with the more stable addition and multiplication data.

The First Week Is Misleading

New users' scores improve rapidly during the first 5-7 days of use. This looks like dramatic cognitive improvement, and it's tempting to attribute it to "brain training." It's not. It's a combination of three factors: reactivation of dormant arithmetic pathways (most users haven't done timed mental math in years), familiarization with the app interface and problem format, and calibration of the baseline algorithm as it accumulates data about the individual user.

The genuine signal starts around week two, once the initial reactivation effect levels off and the baseline stabilizes. If you're running a self-experiment — testing whether a supplement or sleep change affects your cognition — the baseline period should start after this initial ramp-up, not during it. Any intervention tested in the first week will appear to "work" simply because of the practice effect.

The first week of data tells you how rusty you were. The second week starts telling you how sharp you are. Run experiments after the baseline stabilizes, not before.

Consistency Beats Intensity

Users who take the Sharpness Score daily for 30+ consecutive days show a distinctly different data profile from users who engage sporadically. The consistent users' baselines stabilize, their day-to-day variation decreases (as the initial practice effect fades), and meaningful patterns in their data — time-of-day effects, sleep correlations, weekly rhythms — become statistically visible.

Sporadic users, by contrast, show higher variance and less interpretable data. If you take the assessment on Monday, skip Tuesday through Friday, and take it again on Saturday, the two data points are nearly useless for trend detection. The power of the metric comes from frequency, not intensity. A single 60-second session every day is infinitely more informative than a 10-minute session once a week.

This is why the app is designed around a daily micro-habit rather than longer, less frequent training sessions. The measurement needs consistency to be meaningful. And consistency in a 60-second task is achievable in a way that consistency in a 15-minute task is not.

The Weekend Effect

One of the most consistent patterns in the data is what we call the "weekend effect." Users' Monday morning scores are measurably lower than their Wednesday or Thursday morning scores. The magnitude varies by individual, but the pattern is remarkably consistent across different user profiles.

The likely explanation is a combination of disrupted sleep schedules (later bedtimes and wake times on weekends), higher alcohol consumption on Friday and Saturday evenings, and the general break in routine that weekends represent. The caffeine and alcohol patterns track with this — Monday's cognitive performance often reflects Saturday night's choices more than Monday morning's preparation.

For users who notice this pattern in their own data, it provides a concrete, actionable insight: if Monday is consistently your worst cognitive day, schedule your most demanding work for Tuesday through Thursday. Use Monday for administrative tasks, catch-up, and planning. The data doesn't change your schedule automatically, but it gives you information that makes intelligent scheduling possible.

Improvement Plateaus Are Normal

After the initial reactivation period (week 1-2), most users see a gradual improvement curve that flattens between week 4 and week 8. This plateau isn't a failure — it's the point at which practice has reactivated your existing arithmetic pathways and further improvement requires genuine cognitive adaptation rather than just de-rusting. The plateau represents your current true processing capacity, and the Sharpness Score baseline adjusts to reflect it. From this point forward, the score measures daily variation around your actual capacity, not improvement from rust.

For users who came to the app hoping for continuous improvement, the plateau can feel disappointing. But the plateau is where the tool becomes most valuable — because now you're measuring real cognitive fluctuation, not practice effects. The signal gets cleaner as the noise fades.

What We Got Wrong

We assumed that competitive features — the challenge mode — would be the primary engagement driver. In practice, the daily Sharpness Score itself is what users return for. The number, compared to their own history, is more compelling than we expected. People want to know how they're doing today, relative to themselves. Competition with others is a secondary motivation.

We also assumed that users would primarily engage in the morning. Many do, but a significant segment uses the app in the evening — as a cognitive check-in that's qualitatively different from the morning use case. Morning users want to know "how sharp am I starting the day?" Evening users want to know "how much cognitive capacity do I have left?" Both are valid use cases that the same metric serves differently depending on when you measure.

What Comes Next

As the dataset grows, so does the potential for insight. Population-level patterns — anonymized, aggregated, and stripped of individual-identifying information — could eventually contribute to cognitive science research on daily performance variation, the effects of lifestyle variables on cognition, and the natural trajectories of cognitive aging. None of this requires compromising individual privacy. Aggregate statistical patterns can emerge from locally-stored individual data without any single user's data ever leaving their device.

For now, the most important insight from the data is the simplest one: daily cognitive performance varies more than most people realize, the variation is patterned rather than random, and the patterns are personal. Your brain has rhythms. It responds to what you eat, how you sleep, what you drink, and how you spend your time. The Sharpness Score doesn't create those patterns. It makes them visible — and visibility is the first step toward doing something about them.

Measure your own cognitive sharpness.

MentalMather gives you a daily Sharpness Score based on your speed, accuracy, and personal baseline.

Download Free →