The BCM2708 has a very useful hardware watchdog. It will be used to reset the Raspberry if something hangs.

what is a watchdog?

The Linux kernel can reset the system if serious problems are detected. This can be implemented via special watchdog hardware... There needs to be a daemon that tells the kernel the system is working fine. If the daemon stops doing that, the system is reset. The watchdog is such a daemon. It opens /dev/watchdog, and keeps writing to it often enough to keep the kernel from resetting, at least once per minute. Each write delays the reboot time another minute. After a minute of inactivity the watchdog hardware will cause the reset... The watchdog daemon can be stopped without causing a reboot if the device /dev/watchdog is closed correctly.

linux manpage watchdog

how to enable?

We can load the watchdog kernel module by sudo modprobe bcm2708_wdog however to make it permanent (after reboot) we need to edit the /etc/modules file by adding the line bcm2708_wdog.

/etc/modules
1
2
3
4
5
6
7
8
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.

i2c-dev
bcm2708_wdog


Do not reboot until the full configuration is done!

Install the required packages

sudo apt-get install watchdog chkconfig

And edit the file /etc/default/watchdog. It should look like mine.

/etc/default/watchdog
1
2
3
4
5
# Start watchdog at boot time? 0 or 1
run_watchdog=0
# Load module before starting watchdog
watchdog_module="bcm2708_wdog"
# Specify additional watchdog options here (see manpage).


Please consider to set `run_watchdog=0` when testing. If you have some wrong configuration, you might quickly end in an endless reboot-loop! But do not forget to set to 1 when verified that is works, since it has no effect.

Enable the module by executing

chkconfig watchdog on
sudo /etc/init.d/watchdog start

And edit/create another config file /etc/modprobe.d/watchdog.conf to include the options bcm2708_wdog nowayout=1 heartbeat=15

/etc/modprobe.d/watchdog.conf
1
options bcm2708_wdog nowayout=1 heartbeat=15


Finally the config file /etc/watchdog.conf contains the configuration. (Lines with # are commented out) It is adviced to make a backup of the file.

/etc/watchdog.conf
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#ping           = 193.224.75.125
#ping           = 172.26.1.255
#interface      = eth0
file            = /var/log/syslog
change          = 3605


# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1     = 7
#max-load-5     = 6
#max-load-15    = 5

# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
min-memory      = 10000

#repair-binary      = /usr/sbin/repair
#repair-timeout     = 
#test-binary        = 
#test-timeout       =

watchdog-device = /dev/watchdog
watchdog-timeout   = 10

# Defaults compiled into the binary
#temperature-device =
#max-temperature    = 120

# Defaults compiled into the binary
#admin          = root
#interval       = 1
#logtick                = 1
#log-dir        = /var/log/watchdog

realtime        = yes
priority        = 1

#pidfile     = /var/run/sshd.pid

As you can see only a few lines are active (not commented out). The pidfile of the ssh daemon seems not work very reliably, it produces after 3-4 weeks a reboot. I could not find out the reason, so this is deactivated. You have to verify that /var/log/syslog gets feed once an hour. This should be no problem if you have some cron jobs installed or doing something, but it might be problem, if running a bare raspian where nothing is scheduled. For this case consider to modify this criteria.

Furthermore, please test twice what you set up, since if something goes wrong and you had set run_watchdog=1 in the /etc/default/watchdog file you might end in an endless reboot loop. (for such cases simply take out the SD-card put it in a card reader and edit the mentioned file. There is no way around it.

For me the watchdog has prooven for almost 8 month (at the time of writing this article) of high reliability.



Comments

comments powered by Disqus