The Open Syllabus Project

I’ve been volunteering for the Open Syllabus Project on and off since the Spring of

  1. I attended a hackathon called GLAMhacks (Galleries, Libraries, Archives, and Museums) at the University of Pennsylvania with my friend Rita Zevallos. We decided to rescuscitate this corpus of 1M syllabi, which was missing a majority of the cached syllabus content (only having URLs and other metadata). Over the next two days we poached the Starbucks wifi across the street to download as many of the syllabi as we could from the Wayback Machine.

Rita built a rails app on top of the corpus, and we planned to call it… (drumroll)… Sylla-search. It was a stupid name, so we thought harder, and considered calling it “The Open Syllabus Project”. The name proved fortuitous, because we discovered that a team out of Columbia University had been working on the exact same project. Their progress was a direct superset of ours. They had already taken Dan Cohen’s 1M syllabi, and combined it with a variety of dumps, gifts, and scrapes, with the purpose of providing researchers metadata access to what the global syllabi-scape looked like. I joined them as a volunteer. In the time since then, they’ve hired a fantastic programmer to create a variety of analytics, visualizations, and other interfaces to the underlying data.

Some of the things that I’ve done with them in the intervening time:

  • A machine learning classifier for whether a document is a syllabus or not: notebook code

Classifier ROC comparison