Bookscanner: Difference between revisions

From Wiki-Fou
m (Correct list formatting)
(Comment on the platen)
Line 91: Line 91:
     70    Raspberry Pi 3 (Model B), 2x SD cards, 5V charger, rulers [M] @ Barcelona
     70    Raspberry Pi 3 (Model B), 2x SD cards, 5V charger, rulers [M] @ Barcelona


We also used nuts and bolts, etc. found in the workshop of Calafou (in Catalunya, near Barcelona).  The conclusion is that it should be possible to build this scanner from about 5-600 EUR anywhere in (Western) Europe.  We did not use all the materials we bought and if we would do it again then we tried to use real wood instead of FDM, because the latter proved very fragile.
We also used nuts and bolts, etc. found in the workshop of Calafou (in Catalunya, near Barcelona).  The conclusion is that it should be possible to build this scanner from about 5-600 EUR anywhere in (Western) Europe.  We did not use all the materials we bought and if we would do it again then we tried to use real wood instead of FDM, because the latter proved very fragile.  However, this could make the platen more heavy, and it is already quite heavy.


= Cameras =
= Cameras =

Revision as of 19:22, 16 October 2018

Bookscanning in Calafou:

Building a "New Standard Scanner"

We built this: https://forum.diybookscanner.org/viewtopic.php?f=1&t=333

We are trying to build something similar during the Kunlabora event in Calafou: https://calafou.org/en/content/kunlabora-ephimeral-projects-kooperative

We decided to build a more reproducible and portable and simple scanner: https://forum.diybookscanner.org/viewtopic.php?f=1&t=333

In theory for the simple scanner it should be possible to get all the ingredients from a hardware shop ("ferreteria"). We got most of the materials from Bauhaus in Barcelona (Zona Franca). TODO: list of materials, sources and prices.

we are following this --> http://www.diybookscanner.org/forum/viewtopic.php?f=1&t=333

Base / Base / Base

Column / Columna

Cradle / Cuna

Platen / Platina

CHDK & Raspberry Pi

planning

List of parts

From Bauhaus, Barcelona (Zona Franca):

- 2x Tablero MDF 600x400x10cm
- 2x Tablero MDF 600x400x16cm
- Varilla roscada
- 2x Tira LED 1 nice
- Multienchufa 5 tomas
- Base 4 tomas 1.5 mts
- Tuerca hexagonal
- Guia corredera
- 2x Arandela ancha
- Aldabon BPF 185 M
- Tornillo mad.cab.pl.
- Tornillo cab. avell.
- Tornillo p/metal
- Tuerca palomilla
- Guia cajones 10kg
- Spax cab. red zinc
- Tirador dorado
- 4x Abrazaderas
- Colgador acero negro
- Mosqueton-fw

Other things from other shops:

- 3x Cable USB 2.0 male-female @ Worten Sant Antoni
- 2x USB-A -> USB-B (mini) cables @ Tienda de Cables
- 2x Manfrotto brazo flexible MF237 (Nr. referencia AFP018036) @ FotoK, Ronda Universitat
- 2 x digital cameras (see below for background information): Canon IXUS 175 (Powershot / ELPH 180)
  - https://www.amazon.es/Canon-IXUS-175-compacta-estabilizador/dp/B01A8QU70I/
  - http://chdk.wikia.com/wiki/ELPH180
  - https://www.canon-europe.com/for_home/product_finder/cameras/digital_camera/ixus/ixus_175/specification.aspx
- 2 x gooseneck camera stand ("magic wand")
- 2 x plexi glass from a company in Igualada that J. found.

For electronic parts a good shop in Barcelona is Diotronic: https://diotronic.com/

Budget

Kunlabora event was four days of collaborative construction. More or less six persons worked on building the scanner. This is the receipts that we spent, because we had 500 EUR budget for the raw materials from the participation fee of the event.

Receipts:

   25    2x plexiglass [J] @ Expocryl, Igualada
   18    2x SD cards @ Life Informatica, Barcelona
   50    2x magic want camera stand @ FotoK, Barcelona
   42.2  Nuts/bolts/etc. @ Bauhaus, Zona Franca, Barcelona
   108.8 Nuts/bolts/etc. @ Bauhaus, Zona Franca, Barcelona
   10    2x USB cable male/female @ Worten San Antoni, Barcelona
   5     2x mini USB (USB-B) cable @ Tienda Cables, Barcelona
   178   2x Canon IXUS 175 compact digital camera @ Amazon.es
   60    4x Listones @ Fustes Fargas, Barcelona
   ============================================
   497

Donations:

   70    Raspberry Pi 3 (Model B), 2x SD cards, 5V charger, rulers [M] @ Barcelona

We also used nuts and bolts, etc. found in the workshop of Calafou (in Catalunya, near Barcelona). The conclusion is that it should be possible to build this scanner from about 5-600 EUR anywhere in (Western) Europe. We did not use all the materials we bought and if we would do it again then we tried to use real wood instead of FDM, because the latter proved very fragile. However, this could make the platen more heavy, and it is already quite heavy.

Cameras

Summary of research about cameras for book scanners:

Basically there are three categories of cameras that can be used for book scanners (from cheapest to most expensive).

1. Remote control support

The cheapest option is any camera with remote trigger support, so we can take pictures without pushing the button on the camera. This is important because when you press the button the camera position may be disadjusted to the physical pressure.

2. CHDK firmware

Middle category is CHDK firmware compatibles. CHDK is a third party open source firmware that allows the customisation of cameras. CHDK firmware is for Canon Powershot cameras, which are the cheaper compact digital camera product line. We have 200 euros in the budget for cameras, so we went with this option.

3. Magic Lantern support

Magic Lantern is a third party open source firmware that is more advanced. However, it only works with Canon DLSR cameras (these are the cameras that have a reflex mirror to look at the shot through a small hole before you take the picture, and they usually have big lenses). The scanner we have now uses Canon 1100D, which are the cheapest type suported by Magic Lantern, but they still cost a few hundred euros.

Pictures Day 1 (temporary)

Scanning

The amount of work in the postproduction phase depends on how good quality images you can make in the scanning phase!

  1. Setting up the cameras (calibration): the most important part.

Caveats:

  • open the book right in the middle (at the central page) to calibrate the cameras.
  • camera should look at right angle on the page. Make sure the cameras are parallel to the angles of the cradle.
  • all the page should be in the image
  • check if the pages fold/curve; if so, place something underneath to straighten it (like a sponge, or another book...)
  • camera settings: fully automatic, perhaps with manual focus.
  • back up and empty the SD cards in the cameras
  • most subtle mistake: one camera sees letters bigger than the other camera
  • use a post-it or similar to mark the exact position of the book in relation to the lower edge of the cradle, to ensure it remains in the same position throughout the scanning.
  1. Push the big button on the scanner to scan.
  • maybe you have to put your finger to the side of the plexiglass which is closer to you when it is “down”, because the plexiglass is not always exactly the same angle as the book pages
  1. Download the images from the SD cards and put the scanner to sleep.
  • from the camera on the left, copy the images to a folder called “odd”
  • from the camera on the right, copy the images to a folder called “even”
  • upload the two folders now to to ftp://seldon.calafou/HackTheBiblio/scanning/<math>bookname--</math>yourname/ folder
  • remember to delete the pictures from the SD cards and put them back to the cameras, and maybe put the camera batteries to charge

Dependencies

There are many ways to scan, this is the current state of the art in Calafou.

Using an up-to-date Debian operating system, you can install the following programs for the postproduction steps:

  • scantailor
  • gprename
  • pdftk
  • tesseract-ocr
  • tesseract-ocr-eng
  • tesseract-ocr-spa
  • calibre

You can install all these programs with the following invocation:

sudo apt install scantailor gprename pdftk tesseract-ocr /
         tesseract-ocr-eng tesseract-ocr-spa calibre

Postproduction

You start with two folders with files like IMG_1234.JPG

The basic workflow is like this:

  1. [program] ➔ [output]
  2. gprename ➔ 1.jpg, 2.jpg, …
  3. scantailor ➔ 1.tif, 2.tif, …
  4. tesseract ➔ 1.pdf, 2.pdf, …
  5. pdftk ➔ book.pdf
  6. calibre ➔ book.epub
  7. libgen.org ➔ http://libgen.org/book/index.php?md5=B6916395FDE00D91DB4F52DCB8F069BF
  8. etc.

There are some bash oneliners which can be useful (on Debian based systems):

  1. FIXME we can probably write a script to rename the files properly… but for now, in gprename select the “numberical” tab, start = 1 for right-pages and 2 for left-pages, always step = 2.

  2. You can rotate the images appropriately (which is called “fix orientation” in scantailor) in the left/right folders before you import them. This is faster than in scantailor I think. However, you can also make the same operation in scantailor in a more user friendly way.

    sudo apt-get install imagemagick
    cd left
    mogrify -verbose -rotate 270 *
    cd ../right
    mogrify -verbose -rotate 90 *
  3. Does Optical Character Recognition (OCR) on all images in folder:

    time for i in *tif; do b=$(basename $i .tif); tesseract -l spa "$i" "$b" pdf; done
  4. Merges all the pdf files in folder into one single file:

    pdftk *pdf cat output book.pdf
  5. Exports the pdf metadata to a text file, to edit:

    pdftk book.pdf  dump_data output report.txt
  6. Imports the metadata of report.txt back on the pdf:

    pdftk book.pdf update_info report.txt output bookcopy.pdf

Distribution

Think about how people who would be interested in this book could know about it!

Repositories:

You may consider spreading the word on relevant mailing lists, social media, etc.

Raspberry Pi for the bookscanner

At the moment we use an electromechanical button documented on the CHDK website that works well but does not do much more than triggering the two cameras to shoot at the same time. There are many more possibilities of how to optimise the bookscanning process. Many ideas start with connecting the cameras to a small computer such as the Raspberry Pi. We made some experiments and tests with this setup, which is documented below:

It runs Raspbian.

CHDKPTP

In /home/pi/chdkptp there are precompiled binaries of CHDKPTP downloaded from here.

CHDKPTP is used for remote control of camera running CHDK firmware.

Our setup has two modes of work:

  • mechanical "button" which triggers both camera capturing its photo
  • (in progress) Raspberry Pi mode where an bookscanner operator uses Raspberry Pi to capture photos, preview photos in real time and transfer them already renamed for the next step in Scantailor.

When camera is connected this line will list info about the camera e.g.:

sudo ./chdkptp.sh -elist

and for one of the camera this is what is listed then:

-1:Canon IXUS 175 b=001 d=030 v=0x4a9 p=0x32c1 s=8B20D62641B041BAA3E1D597D560D110

An example of capturing a picture from commandline (once in /home/pi/chdkptp/):

sudo ./chdkptp.sh -e"connect -d=030" -erec -eshoot

Above line connects to the camera sitting on -d=021, puts it into rec mode (if not already) and capture the photo saving it to SD card already in camera. If one wants to bypass the SD card altogether it should replace -eshoot with -eremoteshoot. In that case ./chdkptp.sh will save photos into the directory from where it was called.

ZeroTier

It is added to 565799d8f6ebf1a8 public network of ZeroTier with this command:

sudo zerotier-cli join 565799d8f6ebf1a8

and it got static IP address (in 565799d8f6ebf1a8 network):

192.168.192.171/24

In /home/pi/.ssh/authorized_keys public keys of maxigas and marcell are added.

Ideas for building Voja's scanner

Our first idea was to reproduce Voja's scanner but we had to realise that we are not Voja and the scanner is very beautiful and unique engineering and consequently it is hard to reproduce.

Here are two links to the public documentation of our scanner, built by Voja Antonic:

https://www.memoryoftheworld.org/blog/2012/10/28/our-beloved-bookscanner-2/

https://hackaday.io/project/5604-diy-book-scanner

The electronics is not really documented (which means that it is hard to reproduce) and it is built from basic parts (which means that it takes a lot of time to put it together). So we brainstorm about an Arduino-based solution instead. Arduino is a general-purpose programmable microcontroller that has already built-in many of the functions/parts we need. The idea is that this makes it easier for us to build the scanner and for others to reproduce it. We also have more experience working with Arduino than with only basic electronic components.

We will also try to use cheaper cameras in order to bring down the budget.

Biblio-graphy

About our first bookscanner

English

Spanish

Principal sources

Reading And Leading With One Laptop Per Child

next steps