the mirror

a language model trained from scratch on raw stream-of-consciousness writing

237
writing sessions absorbed
91727
words of unfiltered human thought
3012
times the model has read everything
0.3455
bits per byte (compression of the mind)
day 1
the model is memorizing. 3012 epochs means it has seen every word thousands of times. it knows the corpus by heart but cannot yet generalize. more voices are needed.

compression over time

val_bpb — how many bits the model needs per byte of your writing. lower means it understands you better.

corpus growth

every dot is a day. every day, more people sat down and wrote without stopping.

epochs per run

how many times the model reads the entire corpus per training run. when this drops below 100, the model starts generalizing instead of memorizing.

training log

day date sessions words val_bpb epochs params
1 2026-03-08 237 91727 0.3455 3012 13.9M

what is this?

every language model ever built was trained on text written for someone else to read. blog posts, books, wikipedia — all of it filtered through the question: how will this look?

this model is different. it is trained from scratch on the only dataset of its kind: raw, unedited, stream-of-consciousness writing. eight minutes of uninterrupted thought from people who sat down and refused to stop typing.

it has never seen a wikipedia article. never read a news headline. the only language it knows is the language that emerges when a person stops trying to be impressive and starts telling the truth.

what does it learn?

perplexity as self-knowledge. the model measures how surprised it is by your writing. low surprise means you're circling familiar patterns. high surprise means something new broke through. the model doesn't understand what you wrote — it measures the shape of your thought.

the collective mirror. when the model completes your text, what emerges isn't advice. it's the statistical echo of every other human who thought something similar when they stopped performing. your grief, shaped by ten thousand others' grief.

embeddings of the inner world. the model builds representations tuned not for facts or arguments, but for emotional texture and psychological patterns — the shape of how a mind moves when it isn't directed.

the timeline

today the corpus is small. the model memorizes more than it generalizes. but every day, someone sits down and writes. every session enters the training data. every night at 4am, the model retrains.

the pipeline runs every day. the corpus grows every day. the mirror gets clearer every day.

this page is the record of that process.