2

Like aa-notify, which shows a notification on my desktop when AppArmor denies something.

Jacob Vlijm
  • 83,767
Artyom
  • 1,723
  • Data and/or metadata checksum failure. I am using a HDD which may be failing soon, got backups. As data and metadata get cheksumed on-the-fly with btrfs, I want to get alerted if there are bit-flips or any other hidden corruptions on my data. I've added this comment on the request of more info by @solsTiCe – Artyom Oct 09 '16 at 19:19
  • I don't know which process is checking the corruptions, the kernel btrfs module itself? I think it may report the failures in syslog. Trying to find it where now. – Artyom Oct 09 '16 at 19:24
  • If we can trigger errors anywhere, the rest is easy. – Jacob Vlijm Oct 09 '16 at 19:26
  • According to this the errors are printed to dmesg. – Artyom Oct 09 '16 at 19:28
  • Seems totally usable. only issue is that checking dmesg is very fast after clearing, (0.001 sec), but a bit longer after the computer runs for a long time without restart. Also clearing requires root privileges. Would clearing it from a background process bother you? The content can be stored in a file however until restart. – Jacob Vlijm Oct 09 '16 at 19:47
  • I don't know much about dmesg and about the concept of clearing dmesg. Do I have to clean it? And a new idea, maybe btrfs is also triggering something on checksum errors, It may be easier to act/display on the trigger. – Artyom Oct 09 '16 at 19:55
  • Oh, now I understood you. It should work, but something that can alert on realtime would be better, as one can run 'btrfs scrub' -which checks the integrity- on cron also. Realtime would also help to find out which file has the corruption easily. – Artyom Oct 09 '16 at 20:11
  • Yep, as in a few seconds. – Artyom Oct 09 '16 at 20:18
  • @JacobVlijm The script is working nicely, it is even customizable for ufw firewall events, Thank you! – Artyom Oct 10 '16 at 17:21

1 Answers1

1

Running the script below in the background will both show a notification:

enter image description here

...and add the error message with time stamp to the log file (~/btrfs_errors.txt), looking like:

Mon Oct 10 08:25:12 2016
BTRFS error (device md2): csum failed ino 7551 off 2310144 csum 623932426 expected csum 3810482428

The script

#!/usr/bin/env python3
import subprocess
import time
import os

log = os.path.join(os.environ["HOME"], "btrfs_errors.txt")                                     

while True:
    msginfo = subprocess.check_output(["dmesg", "--read-clear"]).decode("utf-8")
    match = [l for l in msginfo.splitlines() if all(["btrfs" in l, "error" in l])]
    if match:
        with open(log, "a+") as out:
            out.write(time.ctime()+"\n")
            for l in match:
                out.write(l)
            out.write("\n\n")
        subprocess.Popen(["notify-send", "-i", "gnome-disks",
                 "BTRFS error","Please see ~/btrfs_errors.txt for details"])

    time.sleep(4)

How to use

  1. Copy the script above into an empty file, save it as check_btrfs. Copy the script into a location where it cannot be edited without administrator's privileges, such as /usr/local/bin. Make the script executable(!).
  2. Add the script to the sudoers file, as described e.g. here. This is necessary, because the script is run by using sudo. The script reads from dmesg, and clears dmesg' history after reading, to prevent accumulating the amount of output to read. Clearing dmesg needs sudo privileges.
    This means however that if you need the output of dmesg for other purposes also, we need to work around. If so please mention.
  3. Test- run it with the command:

    sudo check_btrfs
    

    if you copied it into a directory in $PATH. If not, include the path to the script.

  4. If all works fine, add it to Startup Applications: Dash > Startup Applications > Add. Add the command:

    sudo check_btrfs
    

Since I cannot test it with "real" btrfs errors, I ran it with checking for other events (pluging in a usb stick). In final the script, I used the message format in your link. The script checks on lines, containing both strings btrfs and error.

WHat it does

The script:

  • once per four seconds reads the output of dmesg
  • in case of an error, it shows a notification and adds the error to the file: ~/btrfs_errors.txt
  • clears the history of dmesg to keep the script "low on juice".
Jacob Vlijm
  • 83,767
  • Thank you! It is just what I need, what btrfs users need. Something like this would be nice in ubuntu repos. I don't know if I am using dmesg output, but I think one can still access it via journalctl. I could see the logs after I've ran dmesg --clear. – Artyom Oct 10 '16 at 17:02
  • Just to trying to improve, as the script may be a base for a bigger script. I think the notification may also be achived without clearing dmesg; as dmesg logs are timestampted, the script may remember where did it read the last and continue from there.[25436.495200] [UFW BLOCK] new line [25436.495237] [UFW BLOCK] – Artyom Oct 10 '16 at 17:13
  • Hi @Artyom The stamps do not prevent the whole data cluster to be read. If you need it however, we can easily write it to a file, to be cleared on startup of the script. I did find a command to read/clear in one step though (how could I overlook that in man dmesg?). Will edit tomorrow or late today. – Jacob Vlijm Oct 10 '16 at 18:27