Omnius

From Wiki-Fou
Revision as of 20:17, 26 December 2019 by Maxigas (talk | contribs)

Omnius is the main server providing services on the calafou internal network (".calafou").

Services

Ideally, a short description of services should be available at http://omnius.calafou (this address is only accessible from the local network, e.g. if you are physically in Calafou or use a VPN to connect to the Calafou network).

apt-cacher-ng (OLD)

"Cache para tu distro de linux, cuanto mas la uses más rápido bajarás tus paquetes."

You can use apt-cacher-ng to install/upgrade Debian and Ubuntu packages on your computer faster than usual and saving a little bandwidth for the community. It saves to the local disk all the packages people request and if the package is already on the disk then it will serve it. Therefore, if you or somebody else used the package you want to install/upgrade before, then apt-cacher-ng will serve it to you faster and without downloading it again from the Internet. This is most useful in workshops when a group of people wants to install a concrete package at the same time, but it is also good to use it in your everyday life.

How to configure my computer to use the apt-cacher-ng in omnius?

The instructions are here: http://omnius.calafou:3142/

Software setup

Omnius runs the current Debian GNU/Linux stable (codename jessie).

One big change in this Debian version is that systemd is used to manage services.

Hardware setup

How the disks are connected?

Omnius has an old motherboard with no SATA support. The workaround is that there is a RAID controller card installed in a PCI slot. However, the RAID controller card driver is not working in Debian Jessie, so we actually use software RAIDs. The function of the RAID controller card is simply to provide 4 SATA ports where we can connect hard drives.

The only disk that is connected directly to the motherboard through an IDE cable (not SATA) is the operating system disk.

Which disks are connected?

There are many disks in omnius. The best way to get an overview is to run `lsblk --fs`. It gives a similar output:

root@omnius:~# lsblk --fs -o +SIZE
NAME                    FSTYPE            LABEL             UUID                                   MOUNTPOINT          SIZE
sda                                                                                                                  931.5G
└─sda1                  linux_raid_member seldon:alexandria 18a48690-b180-cb51-582e-9ab45af523be                     931.5G
  └─md127               crypto_LUKS                         eb337def-eca9-4d6e-8614-b65ea58b4266                       1.8T
    └─alexandria        ext4                                b95eeee3-d8ca-4b90-a060-70073dd7a116   /var/alexandria     1.8T
sdb                                                                                                                  931.5G
└─sdb1                  linux_raid_member seldon:alexandria 18a48690-b180-cb51-582e-9ab45af523be                     931.5G
  └─md127               crypto_LUKS                         eb337def-eca9-4d6e-8614-b65ea58b4266                       1.8T
    └─alexandria        ext4                                b95eeee3-d8ca-4b90-a060-70073dd7a116   /var/alexandria     1.8T
sdc                                                                                                                  931.5G
└─sdc1                  ext4                                e81c5f79-bbba-4eb2-9e85-0fb3c3110b6f   /srv/istanbul     931.5G
sdd                                                                                                                  465.8G
└─sdd1                  ext3                                ca5cb667-3fc3-4e97-93ef-467f4e9b04c8   /mnt/externaldisk 465.8G
sde                                                                                                                  465.8G
└─sde1                  ext4                                ff50c30a-1688-46df-b736-21a28e56450e   /mnt/tmp          465.8G
sdf                                                                                                                   74.5G
├─sdf1                  ext2                                a20aee5f-b77c-4677-acd5-9e1f651233ff   /boot               243M
├─sdf2                                                                                                                   1K
└─sdf5                  crypto_LUKS                         7a1612aa-40d5-4157-a4d7-1ffccdc487dc                      74.3G
  └─sda5_crypt          LVM2_member                         TOH5Xi-JSBJ-05UP-bQpy-xkG6-nZls-Zne8gF                    74.3G
    ├─omnius--vg-root   ext4                                86be33dd-17b7-4e0b-b5f2-dd885fd6189e   /                  72.3G
    └─omnius--vg-swap_1 swap                                118669de-118b-4681-bffb-d852a6842820   [SWAP]                2G
root@omnius:~# 
  1. Operating system on /dev/sdf:
    1. /dev/sdf1 is /boot.
    2. /dev/sdf2 is not seem to be used, maybe it was planned to be part of a RAID1 volume for the root file system.
    3. /dev/sdf5 encrypted root file system (/) and swap file system.
  2. apt-cacher (proxy for caching Debian packages) on /dev/sde.
    1. /dev/sde1 is a file system with all the old packages we were caching (so, useless).
  3. pxe (network booting for installing Linux on machines that are connected to the local network)
    1. /dev/sdc1 mounted on /src/istanbul.
  4. Alexandria (mainly media files like films and music) on /dev/sda (this is in RAID0 with the next disk).
    1. /dev/sda1 is part of the RAID volume /dev/md127 (also called /dev/md/alexandria).
  5. Alexandria (mainly media files like films and music) on /dev/sdb (this is in RAID0 with the previous disk).
    1. /dev/sdb1 is part of the RAID volume /dev/md127 (also called /dev/md/alexandria).

Nota bene: The last few lines of /etc/fstab show that some directories on /srv/instanbul are mounted on /var/alexandria/!

How healthy are the disks?

The smartmontools package in Debian provides the smartctl command to check disk health. Modern hard disks support the SMART standard, which is for keeping a diary of errors and comparing it with the ideal performance of the disk as it was specified by the vendor. It is useful for finding out when the disk is getting old and starts to make mistakes. After some time making mistakes, the disk can die easily.

root@omnius: for disk in sda sdb sdc sde sdf ; do smartctl -x /dev/$disk > /root/reinstall/smartinfo/$disk.txt ; done 

root@omnius:~/reinstall/smartinfo# rgrep Raw_Read_Error_Rate *txt
sda.txt:  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
sdb.txt:  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
sdc.txt:  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    4
sdd.txt:  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
sde.txt:  1 Raw_Read_Error_Rate     POSR--   100   253   006    -    0
sdf.txt:  1 Raw_Read_Error_Rate     POSR--   108   092   006    -    66333935
root@omnius:~/reinstall/smartinfo# 

It is also possible to run different kinds of health checks on the disk using the same command. The three adjacent columns in the above output are between 001 and 254, where 254 is the best and 001 is the worst. The first column is the current health value, the second column is the worst ever measured health value, and the third column is the manufacture-assigned limit where the disk should be replaced. For example, the output above shows bigger numbers in the first and second column than in the third, which means that Raw Read Error Rate is within healthy limnits in all the disks.

Where are the disks connected?

The box of omnius has a lot of space for hard disks:

  1. EMPTY
  2. EMPTY
  3. alexandria
  4. alexandria
  5. EMPTY
  6. EMPTY
  7. EMPTY
  8. omnius-os
  9. EMTPY
  10. EMPTY
  11. apt-cacher

BIOS problems

Blinking cursor: After the message "Successfully installed BIOS" the screen goes black and there is only a blinking cursor. The solution to this problem is to turn off the "BBS support" option in the RAID controller menu, in the SATA configuration section (enter with Control-A when booting).

Specifications

1GB RAM: omnius has 1GB of RAM. The motherboard has 4 slots which are divided into two banks. Each bank has to have identical amount of RAM. At the moment only the first bank is used and there are two 512MB RAMs (PC2100) installed in them.

2x2.66Ghz CPU: It seems that omnius has 2 Intel(R) Xeon(TM) CPU 2.66GHz CPUs (single core).

(2x)1TB HDD: alexandria has two 1TB HDDs in RAID0, so they effectively look like a 2TB disk.

More details

Power supply: ATX, with at least 4 SATA connectors. The motherboard connector is a 4x2 ping connector. At the moment we don't use more the other cables on the power supply.

NICs: There are two ethernet sockets, one 10/100Mbit and another 1Gbit. The first is turned off in BIOS, the other is used as the primary network interface (e.g. eth0).

Backups

1. RAID1 for alexandria (BROKEN: now it is a RAID0)

alexandria is automatically copied to another disk. So if one disk fails, they should still work without interruption.

2. Offsite backup for alexandria (BROKEN: NAS failed)

Backup happens every day at 3am using a software called "restic", to hypatia, which is a NAS (Network Attached Storage) far from the hacklab.

LUKS

There are three ways to book omnius:

Manual

Going to the hacklab and typing in the passphrase using the monitor and keyboard that is connected to omnius.

Semi-automatic

Using a bash script executed from another computer on the local network.

This works because there is ssh baked into the initrd (the disk partition that is alive at boot time) of omnius.

Why this can fail?

  1. Network problems: the two computers cannot ping each other.
  2. SSH is not available on omnius initrd: ssh is not in initrd of omnius any more, because of some upgrades.
  3. SSH is not available on the other computer: ssh is not installed, try "apt-get install openssh-client".
  4. Password incorrect: the script has an old password.

Automatic (BROKEN: Raspi needs fixing)

Mandotron is another computer (a Raspberry Pi) which should under normal circumstances boot omnius.

This works because there is mandotron-client baked into the initrd (the disk partition that is alive at boot time) of omnius, and it is installed and configured on the other machine (mandotron).

Why this can fail?

  1. mandos has a timeout: if it cannot see omnius for some time, it will refuse to serve the LUKS passphrase. The timer can be reset manually by logging in to mandotron. Check the mandos documentation.
  2. mandotron itself is not working: the other machine with mandos is not online. The most common problem with Raspberry Pi is that a surge in electricity can leave the SD card which holds the file system in an inconsistent state. Try to pull out the SD card and run "fsck" on it from another computer.