Putting the World's Cultural Heritage Online With Crowdsourcing
Why is eating a toad for breakfast like raw OCR text?
By the end of this session you will know the answer to this important question. And you will also learn …
+ what crowdsourcing is.
+ who contributes to crowdsourcing projects at cultural heritage organizations (CHO).
+ where to find crowdsourcing projects at CHOs, especially historical digital newspaper collections.
+ how you can contribute to crowdsourcing projects
+ why you should use crowdsourced data for your own genealogical research.
+ why the proverb “if at first you don’t succeed, try, try again” doesn’t apply to skydiving.
Following the splash made by National Library of Australia’s Trove crowdsourced newspaper OCR text correction and tagging, more and more cultural heritage organizations have begun to use crowdsourcing for projects that would otherwise have been expensive (transcription of manuscripts and records, correction of OCR text to high accuracy) or computationally impossible (tagging images and articles with noisy text). Trove was not the first use of crowdsourcing for cultural heritage content — that distinction belongs to Project Gutenberg / Distributed Proofreaders — but Trove was the first to use crowdsourcing for a mass digitization project. In this class we will briefly examine crowdsourcing, a few cultural heritage projects using crowdsourcing, its economics, and motivations of the crowd.
Frederick Zarndt has worked with historic and contemporary newspaper, journal, magazine, book, and records digitisation since computer speeds, software, technology, storage, and costs first made it practical. He worked with the Library of Congress on its pilot implementation of the NDNP National Digital Newspaper Program (2003), with the University of Utah since the beginning of its newspaper digitisation program (2002), with the New Zealand National Library on its Papers Past and Parliamentary Papers digitisation projects (2006), with Singapore National Library Board on its historic and born digital newspapers conversion projects (2006), with the National Library of Australia and with the State Library of Victoria on the Australian Newspapers Digitisation Program (2008), and with many other institutions both small and large. Frederick has experience in every aspect of digitisation projects including project requirements development, project management, conversion operations (both in-house and outsourced), acceptance testing, and software development for production and delivery of digital data.
Frederick is current chair of the IFLA Newspapers Section (the first non-librarian to serve as chair). He presently works as technical, business development, and sales consultant for Digital Divide Data (since 2008), Content Conversion Specialists (since 2005), and DL Consulting (since 2001). Previously he was President of Planman Consulting North America, a subsidiary company to Planman Technologies. Until 2005 he was Chief Technology Officer and one of the co-founders of iArchives / Footnote. While CTO at iArchives, his engineering team created a custom genealogical records data entry application for FamilySearch.org, which is today used by over 780,000 “crowdsource” volunteers worldwide.
Frederick has 25+ years experience in software development and is a member of ACM and IEEE and a Certified Software Development Professional (CSDP). He is also a member of ALA, IFLA, and SLA. Frederick has Master’s Degrees in Computer Science and Physics.