My research depends on data that many, many people have collected over many, many years. A lot of it is still on data collection sheets used in the field and has been sitting ignored in filing cabinets. It is absolutely fantastic that people have been willing to dig up and share this stuff with me. Hopefully by the end of my project I’ll have a great big data package to publish on dryad! Then whenever anyone needs this kind of data, no one has to waste time digging through decades of old files.
Now that I’ve got the data, I need to analyze it. And to do that I need to get these handwritten data into something I can feed R. Before I talk some poor undergrad into helping me out, I thought I’d look into some kind of automated solution. My knowledge of OCR at this point is “sometimes some program can read text in an image.” Any advice? You can see a sample of what I’m working with in the image above.
Epigenetic Regulation by Long Noncoding RNAs Every time we look, genomes are more complicated. There’s so much left to find out!