Bookscanner: Difference between revisions
From Wiki-Fou
(Copy workflow PDF to wiki page) |
(Update version 5) |
||
Line 1: | Line 1: | ||
__TOC__ | __TOC__ | ||
There are many ways to scan | There are many ways to scan, this is the current state of the art in Calafou. | ||
= Scanning = | = Scanning = | ||
Line 22: | Line 20: | ||
<li>Push the big button on the scanner to scan.</li></ol> | <li>Push the big button on the scanner to scan.</li></ol> | ||
* maybe you have to put your finger to the side of the plexiglass which is closer to you when it is | * maybe you have to put your finger to the side of the plexiglass which is closer to you when it is , because the plexiglass is not always exactly the same angle as the book pages | ||
<ol start="3" style="list-style-type: decimal;"> | <ol start="3" style="list-style-type: decimal;"> | ||
<li>Download the images from the SD cards and put the scanner to sleep.</li></ol> | <li>Download the images from the SD cards and put the scanner to sleep.</li></ol> | ||
* from the camera on the left, copy the images to a folder called | * from the camera on the left, copy the images to a folder called | ||
* from the camera on the right, copy the images to a folder called | * from the camera on the right, copy the images to a folder called | ||
* upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder | * upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder | ||
* remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge | * remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge | ||
= Dependencies = | |||
Using an up-to-date Debian operating system, you can install the following programs for the postproduction steps: | |||
* scantailor | |||
* gprename | |||
* pdftk | |||
* tesseract-ocr | |||
* tesseract-ocr-eng | |||
* tesseract-ocr-spa | |||
* calibre | |||
You can install all these programs with the following invocation: | |||
<pre>sudo apt install scantailor gprename pdftk tesseract-ocr / | |||
tesseract-ocr-eng tesseract-ocr-spa calibre</pre> | |||
= Postproduction = | = Postproduction = | ||
Line 39: | Line 53: | ||
<ol start="0" style="list-style-type: decimal;"> | <ol start="0" style="list-style-type: decimal;"> | ||
<li>[program] | <li>[program] 1.jpg, 2.jpg, 1.tif, 2.tif, 1.pdf, 2.pdf, book.pdf</li> | ||
<li>calibre http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF</li> | |||
<li>calibre | |||
<li>etc.</li></ol> | <li>etc.</li></ol> | ||
There are some bash oneliners which can be useful (on Debian based systems): | There are some bash oneliners which can be useful (on Debian based systems): | ||
<ol | <ol style="list-style-type: decimal;"> | ||
<li><p><code>FIXME</code> we can probably write a script to rename the files properlynumbericalfix orientation$beducational materials | |||
<li><p><code>FIXME</code> we can probably write a script to rename the files | |||
You may consider spreading the word on relevant mailing lists, social media, etc. | You may consider spreading the word on relevant mailing lists, social media, etc. | ||
Line 96: | Line 75: | ||
[http://en.flossmanuals.net/e-book-enlightenment/ Reading And Leading With One Laptop Per Child] | [http://en.flossmanuals.net/e-book-enlightenment/ Reading And Leading With One Laptop Per Child] | ||
Revision as of 04:29, 18 April 2017
There are many ways to scan, this is the current state of the art in Calafou.
Scanning
The amount of work in the postproduction phase depends on how good quality images you can make in the scanning phase!
- Setting up the cameras: the most important part.
Caveats:
- camera should look at right angle on the page
- all the page should be in the image
- camera settings: full automatic, perhaps with manual focus
- back up and empty the SD cards in the cameras
- most subtle mistake: one camera sees letters bigger than the other camera
- Push the big button on the scanner to scan.
- maybe you have to put your finger to the side of the plexiglass which is closer to you when it is , because the plexiglass is not always exactly the same angle as the book pages
- Download the images from the SD cards and put the scanner to sleep.
- from the camera on the left, copy the images to a folder called
- from the camera on the right, copy the images to a folder called
- upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder
- remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge
Dependencies
Using an up-to-date Debian operating system, you can install the following programs for the postproduction steps:
- scantailor
- gprename
- pdftk
- tesseract-ocr
- tesseract-ocr-eng
- tesseract-ocr-spa
- calibre
You can install all these programs with the following invocation:
sudo apt install scantailor gprename pdftk tesseract-ocr / tesseract-ocr-eng tesseract-ocr-spa calibre
Postproduction
You start with two folders with files like IMG_1234.JPG
The basic workflow is like this:
- [program] 1.jpg, 2.jpg, 1.tif, 2.tif, 1.pdf, 2.pdf, book.pdf
- calibre http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF
- etc.
There are some bash oneliners which can be useful (on Debian based systems):
FIXME
we can probably write a script to rename the files properlynumbericalfix orientation$beducational materials You may consider spreading the word on relevant mailing lists, social media, etc.Biblio-graphy
About our book scanner
Principal sources