Last month I asked for advice on automating a huge bunch of data transcription. At that point, my knowledge of OCR was that OCR exists. Now I’ve learned enough to know that OCR won’t solve my data transcription woes.1 The forms my data are entered on differ from year to year and sometimes even site to site. The forms are a mixture of handwritten and printed information and printed parts are often modified by hand. I didn’t find anything I could use out of the box or that looked easy for me to modify to digitize my data. If I’d had 10 or 100 times the data I do, it may have been worth it to work harder to develop an automated solution. In the meantime, my supervisor hired an undergraduate to help me.
If you’re in a similar boat, but your supervisor is less awesome, you may want to consider Mechanical Turk or inviting your hipster friends to super underground data transcription parties.
1 Thanks @davidjayharris for pointing me in the right direction!