OCR tends to miss (sometimes) really important context.
Make use of derived data, e.g. age at marriage, gender differences, mortality rates, mapping migrations, last names
Don’t spend all your time cleaning data.
Try a sequence of things, not one big idea.
Don’t expect existing tools to do the whole job. All the while, building new tools is not as straightforward.
Sample Digital Humanities Projects
ORBIS by Stanford . A kayak for the Roman Era.
Old Bailey Online Human transcriptions of criminal proceedings.
Sylvia Beach’s Bookstore in Paris . A publisher for the outcasts. Attaching metadata is hard.
Source: SoECS meeting