Kaci Nash, M.A.

the past in progress

(DayOfDH) Processing Texts: Making Documents Machine Readable

Leave a comment

I am participating in Day of DH 2014 this year and maintaining a blog on their website. Cross-posting my entries here, because hey! I’m actually blogging.

As planned, I have spent the greater part of the day organizing the approximately 2,000 freedom petition photographs I took at the National Archives into a coherent filing system, organized by term, category of filing, and case number and documenting the image numbers in the spreadsheet I maintained as I was photographing. I think I am a little over half done with this process. Though I still have about an hour and a half left to dedicate to the D.C. courts project today, I am turning my attention to my other project–Locating Lord Greystoke. Right now we are in the process of building two corpuses of texts–one that is large, inclusive, and will be used in our text analysis efforts, and and a second smaller one of key documents that will be featured on the project’s website. The document that I am working with now has been reviewed by the project leader, historian Jeannette Jones, who has pulled out selected passages from the text and made note of the people, places, and concepts she wants to be called out on the website. An undergraduate student also working on the project has already run the document through an OCR program, the output of which I will mark up in TEI. The notes on the document prepared by Dr. Jones indicate what will make it into the <profileDesc> tag in the TEI header, which items she wants to appear in the site’s Encyclopedia and thus need to be encoded in the text, and which places are going to appear as mapping points for this particular document. At the moment, the website’s documents are indexed in Solr and transformed by Cocoon, but we are looking into migrating over to a different framework in the very near future. You can  view a draft of this process in action at the project’s website, where we have set up a proof of concept using minimal documents and our first pass at the project’s mapping interface.

sdafd
A look at my screen: Dr. Jones’ notes; Oxygen, which I use to encode the XML document; and the Google Spreadsheet that is serving as a working bibliography of our project documents.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s