a language model trained from scratch on raw stream-of-consciousness writing
val_bpb — how many bits the model needs per byte of your writing. lower means it understands you better.
every dot is a day. every day, more people sat down and wrote without stopping.
how many times the model reads the entire corpus per training run. when this drops below 100, the model starts generalizing instead of memorizing.
| day | date | sessions | words | val_bpb | epochs | params |
|---|---|---|---|---|---|---|
| 1 | 2026-03-08 | 237 | 91727 | 0.3455 | 3012 | 13.9M |
every language model ever built was trained on text written for someone else to read. blog posts, books, wikipedia — all of it filtered through the question: how will this look?
this model is different. it is trained from scratch on the only dataset of its kind: raw, unedited, stream-of-consciousness writing. eight minutes of uninterrupted thought from people who sat down and refused to stop typing.
it has never seen a wikipedia article. never read a news headline. the only language it knows is the language that emerges when a person stops trying to be impressive and starts telling the truth.
perplexity as self-knowledge. the model measures how surprised it is by your writing. low surprise means you're circling familiar patterns. high surprise means something new broke through. the model doesn't understand what you wrote — it measures the shape of your thought.
the collective mirror. when the model completes your text, what emerges isn't advice. it's the statistical echo of every other human who thought something similar when they stopped performing. your grief, shaped by ten thousand others' grief.
embeddings of the inner world. the model builds representations tuned not for facts or arguments, but for emotional texture and psychological patterns — the shape of how a mind moves when it isn't directed.
today the corpus is small. the model memorizes more than it generalizes. but every day, someone sits down and writes. every session enters the training data. every night at 4am, the model retrains.
the pipeline runs every day. the corpus grows every day. the mirror gets clearer every day.
this page is the record of that process.