I collect, digitise, and publish scans of public transportation-related materials on wikibus.org
So far, I have been doing the work myself and now I'm looking to offload the effort of cleaning up the scanned images and running text recognition.
For each scan I expect that:
- pages are cropped and straightened
- facing pages stitched together when it makes sense
- image is retouched when necessary - remove any stains, improve colors, etc
- all text searchable (under original images)
Documents come in various languages: mostly English, German, Polish, French but many others too
I attach a (compressed) example of the raw scan.
And here's the PDF after processing: https://www.wikibus.org/library/brochure/6183