OCR tends to miss (sometimes) really important context.
Make use of derived data, e.g. age at marriage, gender differences, mortality rates, mapping migrations, last names
Don’t spend all your time cleaning data.
Try a sequence of things, not one big idea.
Don’t expect existing tools to do the whole job. All the while, building new tools is not as straightforward.
Sample Digital Humanities Projects ORBIS by Stanford ....