Omnius: Difference between revisions
(Correct the interpretation of smartctl output) |
mNo edit summary |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Omnius is the main server providing services on the calafou internal network (".calafou"). | R.I.P. and thank you, Omnius. | ||
Long life [[Essun]] !! | |||
From here on, this is deprecated. | |||
Omnius WAS the main server providing services on the calafou internal network (".calafou"). | |||
= Services = | = Services = | ||
Line 92: | Line 99: | ||
</pre> | </pre> | ||
It is also possible to run different kinds of health checks on the disk using the same command. The three adjacent columns in the above output are between | It is also possible to run different kinds of health checks on the disk using the same command. The three adjacent columns in the above output are between 001 and 254, where 254 is the best and 001 is the worst. The first column is the current health value, the second column is the worst ever measured health value, and the third column is the manufacture-assigned limit where the disk should be replaced. For example, the output above shows bigger numbers in the first and second column than in the third, which means that Raw Read Error Rate is within healthy limnits in all the disks. | ||
== Where are the disks connected? == | == Where are the disks connected? == |
Latest revision as of 07:40, 10 January 2020
R.I.P. and thank you, Omnius.
Long life Essun !!
From here on, this is deprecated.
Omnius WAS the main server providing services on the calafou internal network (".calafou").
Services
Ideally, a short description of services should be available at http://omnius.calafou (this address is only accessible from the local network, e.g. if you are physically in Calafou or use a VPN to connect to the Calafou network).
apt-cacher-ng (OLD)
"Cache para tu distro de linux, cuanto mas la uses más rápido bajarás tus paquetes."
You can use apt-cacher-ng to install/upgrade Debian and Ubuntu packages on your computer faster than usual and saving a little bandwidth for the community. It saves to the local disk all the packages people request and if the package is already on the disk then it will serve it. Therefore, if you or somebody else used the package you want to install/upgrade before, then apt-cacher-ng will serve it to you faster and without downloading it again from the Internet. This is most useful in workshops when a group of people wants to install a concrete package at the same time, but it is also good to use it in your everyday life.
How to configure my computer to use the apt-cacher-ng in omnius?
The instructions are here: http://omnius.calafou:3142/
Software setup
Omnius runs the current Debian GNU/Linux stable (codename jessie).
One big change in this Debian version is that systemd is used to manage services.
Hardware setup
How the disks are connected?
Omnius has an old motherboard with no SATA support. The workaround is that there is a RAID controller card installed in a PCI slot. However, the RAID controller card driver is not working in Debian Jessie, so we actually use software RAIDs. The function of the RAID controller card is simply to provide 4 SATA ports where we can connect hard drives.
The only disk that is connected directly to the motherboard through an IDE cable (not SATA) is the operating system disk.
Which disks are connected?
There are many disks in omnius. The best way to get an overview is to run `lsblk --fs`. It gives a similar output:
root@omnius:~# lsblk --fs -o +SIZE NAME FSTYPE LABEL UUID MOUNTPOINT SIZE sda 931.5G └─sda1 linux_raid_member seldon:alexandria 18a48690-b180-cb51-582e-9ab45af523be 931.5G └─md127 crypto_LUKS eb337def-eca9-4d6e-8614-b65ea58b4266 1.8T └─alexandria ext4 b95eeee3-d8ca-4b90-a060-70073dd7a116 /var/alexandria 1.8T sdb 931.5G └─sdb1 linux_raid_member seldon:alexandria 18a48690-b180-cb51-582e-9ab45af523be 931.5G └─md127 crypto_LUKS eb337def-eca9-4d6e-8614-b65ea58b4266 1.8T └─alexandria ext4 b95eeee3-d8ca-4b90-a060-70073dd7a116 /var/alexandria 1.8T sdc 931.5G └─sdc1 ext4 e81c5f79-bbba-4eb2-9e85-0fb3c3110b6f /srv/istanbul 931.5G sdd 465.8G └─sdd1 ext3 ca5cb667-3fc3-4e97-93ef-467f4e9b04c8 /mnt/externaldisk 465.8G sde 465.8G └─sde1 ext4 ff50c30a-1688-46df-b736-21a28e56450e /mnt/tmp 465.8G sdf 74.5G ├─sdf1 ext2 a20aee5f-b77c-4677-acd5-9e1f651233ff /boot 243M ├─sdf2 1K └─sdf5 crypto_LUKS 7a1612aa-40d5-4157-a4d7-1ffccdc487dc 74.3G └─sda5_crypt LVM2_member TOH5Xi-JSBJ-05UP-bQpy-xkG6-nZls-Zne8gF 74.3G ├─omnius--vg-root ext4 86be33dd-17b7-4e0b-b5f2-dd885fd6189e / 72.3G └─omnius--vg-swap_1 swap 118669de-118b-4681-bffb-d852a6842820 [SWAP] 2G root@omnius:~#
- Operating system on
/dev/sdf
:/dev/sdf1
is/boot
./dev/sdf2
is not seem to be used, maybe it was planned to be part of a RAID1 volume for the root file system./dev/sdf5
encrypted root file system (/
) and swap file system.
- apt-cacher (proxy for caching Debian packages) on
/dev/sde
./dev/sde1
is a file system with all the old packages we were caching (so, useless).
- pxe (network booting for installing Linux on machines that are connected to the local network)
/dev/sdc1
mounted on/src/istanbul
.
- Alexandria (mainly media files like films and music) on
/dev/sda
(this is in RAID0 with the next disk)./dev/sda1
is part of the RAID volume/dev/md127
(also called/dev/md/alexandria
).
- Alexandria (mainly media files like films and music) on
/dev/sdb
(this is in RAID0 with the previous disk)./dev/sdb1
is part of the RAID volume/dev/md127
(also called/dev/md/alexandria
).
Nota bene: The last few lines of /etc/fstab
show that some directories on /srv/instanbul
are mounted on /var/alexandria/
!
How healthy are the disks?
The smartmontools
package in Debian provides the smartctl
command to check disk health. Modern hard disks support the SMART standard, which is for keeping a diary of errors and comparing it with the ideal performance of the disk as it was specified by the vendor. It is useful for finding out when the disk is getting old and starts to make mistakes. After some time making mistakes, the disk can die easily.
root@omnius: for disk in sda sdb sdc sde sdf ; do smartctl -x /dev/$disk > /root/reinstall/smartinfo/$disk.txt ; done root@omnius:~/reinstall/smartinfo# rgrep Raw_Read_Error_Rate *txt sda.txt: 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 sdb.txt: 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 sdc.txt: 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 4 sdd.txt: 1 Raw_Read_Error_Rate PO-R-- 100 100 050 - 0 sde.txt: 1 Raw_Read_Error_Rate POSR-- 100 253 006 - 0 sdf.txt: 1 Raw_Read_Error_Rate POSR-- 108 092 006 - 66333935 root@omnius:~/reinstall/smartinfo#
It is also possible to run different kinds of health checks on the disk using the same command. The three adjacent columns in the above output are between 001 and 254, where 254 is the best and 001 is the worst. The first column is the current health value, the second column is the worst ever measured health value, and the third column is the manufacture-assigned limit where the disk should be replaced. For example, the output above shows bigger numbers in the first and second column than in the third, which means that Raw Read Error Rate is within healthy limnits in all the disks.
Where are the disks connected?
The box of omnius has a lot of space for hard disks:
- EMPTY
- EMPTY
- alexandria
- alexandria
- EMPTY
- EMPTY
- EMPTY
- omnius-os
- EMTPY
- EMPTY
- apt-cacher
BIOS problems
Blinking cursor: After the message "Successfully installed BIOS" the screen goes black and there is only a blinking cursor. The solution to this problem is to turn off the "BBS support" option in the RAID controller menu, in the SATA configuration section (enter with Control-A when booting).
Specifications
1GB RAM: omnius has 1GB of RAM. The motherboard has 4 slots which are divided into two banks. Each bank has to have identical amount of RAM. At the moment only the first bank is used and there are two 512MB RAMs (PC2100) installed in them.
2x2.66Ghz CPU: It seems that omnius has 2 Intel(R) Xeon(TM) CPU 2.66GHz CPUs (single core).
(2x)1TB HDD: alexandria has two 1TB HDDs in RAID0, so they effectively look like a 2TB disk.
More details
Power supply: ATX, with at least 4 SATA connectors. The motherboard connector is a 4x2 ping connector. At the moment we don't use more the other cables on the power supply.
NICs: There are two ethernet sockets, one 10/100Mbit and another 1Gbit. The first is turned off in BIOS, the other is used as the primary network interface (e.g. eth0).
Backups
1. RAID1 for alexandria (BROKEN: now it is a RAID0)
alexandria is automatically copied to another disk. So if one disk fails, they should still work without interruption.
2. Offsite backup for alexandria (BROKEN: NAS failed)
Backup happens every day at 3am using a software called "restic", to hypatia, which is a NAS (Network Attached Storage) far from the hacklab.
LUKS
There are three ways to book omnius:
Manual
Going to the hacklab and typing in the passphrase using the monitor and keyboard that is connected to omnius.
Semi-automatic
Using a bash script executed from another computer on the local network.
This works because there is ssh baked into the initrd (the disk partition that is alive at boot time) of omnius.
Why this can fail?
- Network problems: the two computers cannot ping each other.
- SSH is not available on omnius initrd: ssh is not in initrd of omnius any more, because of some upgrades.
- SSH is not available on the other computer: ssh is not installed, try "apt-get install openssh-client".
- Password incorrect: the script has an old password.
Automatic (BROKEN: Raspi needs fixing)
Mandotron is another computer (a Raspberry Pi) which should under normal circumstances boot omnius.
This works because there is mandotron-client baked into the initrd (the disk partition that is alive at boot time) of omnius, and it is installed and configured on the other machine (mandotron).
Why this can fail?
- mandos has a timeout: if it cannot see omnius for some time, it will refuse to serve the LUKS passphrase. The timer can be reset manually by logging in to mandotron. Check the mandos documentation.
- mandotron itself is not working: the other machine with mandos is not online. The most common problem with Raspberry Pi is that a surge in electricity can leave the SD card which holds the file system in an inconsistent state. Try to pull out the SD card and run "fsck" on it from another computer.