What it does
A data analysis of 80+ Book of the Month picks from 2018 to 2025, joining hand-curated pick history with Open Library metadata in DuckDB, then surfacing patterns in editorial taste through SQL window functions and Python visualization. Covers genre mix shifts, debut author trends, pick velocity, and author demographics across eight years of curation.
Why I built it
Curation is a data problem. BOTM has built a whole brand on "trusted picks" but what does that actually mean in the data? I wanted to see if eight years of picks reveal a consistent editorial fingerprint or if the taste has quietly drifted, because that fingerprint is essentially their retention strategy. If subscribers keep trusting the pick, they keep paying. If the picks start feeling random, they churn. Same instinct that makes me want to look at downstream data at work: the pattern is usually already there, you just have to ask the right question of the right table.
The interesting bit
The genre mix turned out to be more stable than I expected, which actually makes business sense: consistency builds the trust that drives renewals. What shifted was pick velocity. The average time between a book's publication date and its BOTM selection dropped noticeably after 2021, meaning they started chasing cultural momentum harder, probably because a buzzy pick converts better on social. The other finding worth noting: BOTM picks debut authors at a higher rate than the broader literary fiction market, and that has held steady across years. That's not just an editorial choice, it's a differentiation play. You can't get that pick anywhere else.
Stack
Python, DuckDB, Pandas, Matplotlib, Seaborn, Open Library API, gender-guesser, SQL with CTEs and window functions. Pipeline runs end-to-end with make run.