Revision as of 18:19, 30 May 2017

There are many ways to scan, this is the current state of the art in Calafou.

Scanning

The amount of work in the postproduction phase depends on how good quality images you can make in the scanning phase!

Setting up the cameras: the most important part.

Caveats:

camera should look at right angle on the page
all the page should be in the image
camera settings: full automatic, perhaps with manual focus
back up and empty the SD cards in the cameras
most subtle mistake: one camera sees letters bigger than the other camera

Push the big button on the scanner to scan.

maybe you have to put your finger to the side of the plexiglass which is closer to you when it is “down”, because the plexiglass is not always exactly the same angle as the book pages

Download the images from the SD cards and put the scanner to sleep.

from the camera on the left, copy the images to a folder called “odd”
from the camera on the right, copy the images to a folder called “even”
upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder
remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge

Dependencies

Using an up-to-date Debian operating system, you can install the following programs for the postproduction steps:

scantailor
gprename
pdftk
tesseract-ocr
tesseract-ocr-eng
tesseract-ocr-spa
calibre

You can install all these programs with the following invocation:

sudo apt install scantailor gprename pdftk tesseract-ocr /
         tesseract-ocr-eng tesseract-ocr-spa calibre

Postproduction

You start with two folders with files like IMG_1234.JPG

The basic workflow is like this:

[program] ➔ [output]
gprename ➔ 1.jpg, 2.jpg, …
scantailor ➔ 1.tif, 2.tif, …
tesseract ➔ 1.pdf, 2.pdf, …
pdftk ➔ book.pdf
calibre ➔ book.epub
libgen.org ➔ http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF
etc.

There are some bash oneliners which can be useful (on Debian based systems):

FIXME we can probably write a script to rename the files properly… but for now, in gprename select the “numberical” tab, start = 1 for right-pages and 2 for left-pages, always step = 2.
You can rotate the images appropriately (which is called “fix orientation” in scantailor) in the left/right folders before you import them. This is faster than in scantailor I think. However, you can also make the same operation in scantailor in a more user friendly way.
```
sudo apt-get install imagemagick
cd left
mogrify -verbose -rotate 270 *
cd ../right
mogrify -verbose -rotate 90 *
```

Does Optical Character Recognition (OCR) on all images in folder:

time for i in *tif; do b="basename $i .tif"; tesseract -l spa "$i" “$b” pdf; done

Merges all the pdf files in folder into one single file:
```
pdftk *pdf cat output book.pdf
```
Exports the pdf metadata to a text file, to edit:
```
pdftk book.pdf  dump_data output report.txt
```

Imports the metadata of report.txt back on the pdf:

pdftk book.pdf update_info report.txt output bookcopy.pdf

Distribution

Think about how people who would be interested in this book could know about it!

Repositories:

General “educational materials”: Library Genesis
Academic radical: Aaaaarg
Artist radical: Monoskop
Anarchist (including fanzines): Anarchist Library
There are many Zine Libraries you can find on the Internet…

You may consider spreading the word on relevant mailing lists, social media, etc.

Biblio-graphy

About our book scanner

English

Spanish

Principal sources

Reading And Leading With One Laptop Per Child

@@ Line 1: / Line 1: @@
-__TOC__
 There are many ways to scan, this is the current state of the art in Calafou.
@@ Line 20: / Line 18: @@
 <li>Push the big button on the scanner to scan.</li></ol>
-* maybe you have to put your finger to the side of the plexiglass which is closer to you when it is , because the plexiglass is not always exactly the same angle as the book pages
+* maybe you have to put your finger to the side of the plexiglass which is closer to you when it is “down”, because the plexiglass is not always exactly the same angle as the book pages
 <ol start="3" style="list-style-type: decimal;">
 <li>Download the images from the SD cards and put the scanner to sleep.</li></ol>
-* from the camera on the left, copy the images to a folder called
+* from the camera on the left, copy the images to a folder called “odd”
-* from the camera on the right, copy the images to a folder called
+* from the camera on the right, copy the images to a folder called “even”
 * upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder
 * remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge
@@ Line 45: / Line 43: @@
 <pre>sudo apt install scantailor gprename pdftk tesseract-ocr /
-        tesseract-ocr-eng tesseract-ocr-spa calibre</pre>
+         tesseract-ocr-eng tesseract-ocr-spa calibre</pre>
 = Postproduction =
@@ Line 53: / Line 51: @@
 <ol start="0" style="list-style-type: decimal;">
-<li>[program] 1.jpg, 2.jpg, 1.tif, 2.tif, 1.pdf, 2.pdf, book.pdf</li>
+<li>[program] ➔ [output]</li>
-<li>calibre http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF</li>
+<li>gprename ➔ 1.jpg, 2.jpg, …</li>
+<li>scantailor ➔ 1.tif, 2.tif, …</li>
+<li>tesseract ➔ 1.pdf, 2.pdf, …</li>
+<li>pdftk ➔ book.pdf</li>
+<li>calibre ➔ book.epub</li>
+<li>libgen.org ➔ http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF</li>
 <li>etc.</li></ol>
@@ Line 60: / Line 63: @@
 <ol style="list-style-type: decimal;">
-<li><p><code>FIXME</code> we can probably write a script to rename the files properlynumbericalfix orientation$beducational materials
+<li><p><code>FIXME</code> we can probably write a script to rename the files properly… but for now, in gprename select the “numberical” tab, start = 1 for right-pages and 2 for left-pages, always step = 2.</p></li>
+<li><p>You can rotate the images appropriately (which is called “fix orientation” in scantailor) in the left/right folders before you import them. This is faster than in scantailor I think. However, you can also make the same operation in scantailor in a more user friendly way.</p>
+<pre>sudo apt-get install imagemagick
+cd left
+mogrify -verbose -rotate 270 *
+cd ../right
+mogrify -verbose -rotate 90 *</pre></li>
+<li><p>Does Optical Character Recognition (OCR) on all images in folder:</p>
+<pre>time for i in *tif; do b=&quot;basename $i .tif&quot;; tesseract -l spa &quot;$i&quot; “$b” pdf; done</pre></li>
+<li><p>Merges all the pdf files in folder into one single file:</p>
+<pre>pdftk *pdf cat output book.pdf</pre></li>
+<li><p>Exports the pdf metadata to a text file, to edit:</p>
+<pre>pdftk book.pdf  dump_data output report.txt</pre></li>
+<li><p>Imports the metadata of report.txt back on the pdf:</p>
+<pre>pdftk book.pdf update_info report.txt output bookcopy.pdf</pre></li></ol>
+= Distribution =
+Think about how people who would be interested in this book could know about it!
+Repositories:
+* General “educational materials”: [https://libgen.io/ Library Genesis]
+* Academic radical: [https://aaaaarg.org/ Aaaaarg]
+* Artist radical: [https://monoskop.org/ Monoskop]
+* Anarchist (including fanzines): [https://theanarchistlibrary.org/special/index Anarchist Library]
+* There are many Zine Libraries you can find on the Internet…
 You may consider spreading the word on relevant mailing lists, social media, etc.

Anonymous

Search

Bookscanner: Difference between revisions

Namespaces

More

Page actions

Revision as of 18:19, 30 May 2017

Contents

Scanning

Dependencies

Postproduction

Distribution

Biblio-graphy

About our book scanner

Principal sources

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Bookscanner: Difference between revisions

Revision as of 18:19, 30 May 2017

Scanning

Dependencies

Postproduction

Distribution

Biblio-graphy

About our book scanner

Principal sources

Navigation

Wiki tools

Page tools