ologgerd daemon – 11gR2

In past few weeks i have been involved in RAC as well as non-RAC databases upgrade from to every upgrade i come to learn something new πŸ™‚
After an upgrade to, for one of the development database which is a 2-node RAC, EMGC started showing Swap Utilization 99.99%.The “TOP” command showed
node 1 –> az8500
node 2 –> az8501

top - 08:39:20 up 12 days, 10:13,  7 users,  load average: 1.51, 1.21, 1.16
Tasks: 233 total,   1 running, 232 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4% us,  1.3% sy,  0.0% ni, 92.4% id,  0.5% wa,  0.1% hi,  0.4% si
Mem:  16319928k total, 16237460k used,    82468k free,    97948k buffers
Swap:  6289384k total,  6288964k used,      420k free,  5998720k cached
 7414 root      RT   0 13.3g 8.5g  56m S 53.8 54.6   1447:04 ologgerd
 7062 root      RT   0  109m  84m  54m S  3.7  0.5 223:40.41 osysmond.bin
[[email protected] ~]# ps -ef | grep 7414
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501
root     18569 15708  0 04:47 pts/8    00:00:00 grep 7414

Question which comes to mind is what is this daemon? What does it do?
From 11gR2, Oracle introduced a new resource “ora.crf” which is run by “orarootagent” agent and “root” as the owner. This resource in turn spawns osysmond process which spawns the ologgerd daemon, one daemon per cluster node.More or less it seems to be implementation of IPD/OS (Instantaneous Problem Detector) Cluster Health Monitor Tool on the servers which was available for CRS 10gR2 and above.

$ crsctl stat res ora.crf -init -t
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
Cluster Resources
      1        ONLINE  ONLINE       az8501
[[email protected] ~]# ps -ef | grep osysmond
oracle   24181 19835  0 04:02 pts/5    00:00:00 grep osysmond
root     24813     1  0 Feb21 ?        03:46:44 /u01/app/oracle/grid/11.2.0/bin/osysmond.bin
[[email protected] ~]# ps -ef | grep ologgerd
oracle   24246 19835  0 04:03 pts/5    00:00:00 grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501

The 2 daemons are
a.) osysmond –> is the monitoring and OS metric collection daemon on every node.
b.) ologgerd –> follows a master / standby paradigm if more than 1 node in the cluster.The master manages the OS metric database in BDB (Berkeley DB) based database and interacts with the standby to manage a replica of the master metrics
So,osysmond is the monitoring and OS metric collection daemon that sends the data to ologgerd. ologgerd receives the information from all the nodes and persists in a Berkeley DB based database.
The crf folder in $GRID_HOME has 2 folders

az8501:/u01/app/grid/11.2.0/crf> ls -lrt
total 6
drwxr-x---  3 root dba 3 Feb 17 17:27 db
drwxr-x---  3 root dba 5 Mar  2 16:54 admin

For the ologgerd daemon, the directory ($GRID_HOME/crf/db/) specified by “-d” denotes the location where the ologgerd process maintains/stores its logging information on the sever.Number of *.bdb format (Berkeley DB) files and few other files can be found in the directory. “-r” represents the replica.
The admin folder ($GRID_HOME/crf/admin) contains the crf(hostname).cfg and crf(hostname).ora.
The ctf(hostname).ora file shows –

HOSTS=az8500,az8501  --> Hostnames of the Clusters
MASTER=az8500 -->   Hostname for Master daemon
MYNAME=az8501 -->   Server's Hostname
BDBLOC=/u01/app/oracle/grid/11.2.0/crf/db/az8500   --> Location of BDB (Berkeley DB)
az8500 1= localhost.localdomain 0
az8500 16020
az8500 23188
az8501 1= localhost.localdomain 0
az8501 16020
az8501 16021
BDBSIZE=12623 --> host ip of the master daemon
REPLICA=az8501  --> Hostname of Replica daemon

Now, back to the Swap utilization issue.As it was a dev box we decided to kill the process and see what happens.

[[email protected] ~]# kill -9 7414
[[email protected] ~]# ps -ef | grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:49 [ologgerd]
root     18971 15708  0 04:51 pts/8    00:00:00 grep ologgerd

After few seconds,the daemon re-spawned

[[email protected] ~]# ps -ef | grep ologgerd
root     19558     1  4 04:54 ?        00:00:01 /u01/app/grid/11.2.0/bin/ologgerd -M -d /u01/app/grid/11.2.0/crf/db/az8501
root     19585 15708  0 04:54 pts/8    00:00:00 grep ologgerd

Interesting to note that now ologgerd daemon on az8501 has become the Master, which earlier was replica.Even, the content of crf(hostname).ora has changed

az8500 1= localhost.localdomain 0
az8500 16020
az8500 23188
az8501 1= localhost.localdomain 0
az8501 16020
az8501 16021
BDBSIZE=12623 --> IP of the new Mater Daemon

The above shows, the Master Daemon is now running on node2, the IP of node 2 , the master process died on node 1 showed by DEAD, and STATE=mutated.The swap was released and no more alerts πŸ™‚
On node 1

[[email protected] ~]# ps -ef | grep ologgerd
root     4620  4422  0 12:16 pts/1    00:00:00 grep ologgerd
root     11122     1  0 05:01 ?        00:01:00 /u01/app/grid/11.2.0/bin/ologgerd -m az8501 -r -d /u01/app/grid/11.2.0/crf/db/az8500

References –

2 thoughts on “ologgerd daemon – 11gR2

  1. Can ologgerd be kept down permanently or deleted? Someone ( did that. The worry I have is that when you patch the GI and database next time, maybe the patching process will throw errors if it sees some component not running. That’s what happened to us when we kept mgmtdb down.

    1. Hi Yong,
      Thank you for visiting the blog. I do not work on Oracle databases anymore. But, what is you actual issue? Why do you want to shutdown the process? Which oracle version is the database running?

Leave a Reply