XFS filesystem full without an obvious reason?

Ran into an interesting problem with two XFS volumes full on an Oracle Linux 8 system. I was helping for a database upgrade issue and on-screen messages poped out saying “/var/log/sssd/” missing. Apparently sssd was trying to write its logs and could not access the directory.

I started to look around and saw two partitions (xfs) were full

rroot@DBTSVM1:/var/log# df -h -T
Filesystem    Type  Size  Used Avail Use% Mounted on
			  
/dev/sda5     xfs   10G   10G   20K 100% /var/log
/dev/sda6     xfs   10G   10G   51M 100% /var/log/audit

As usual, I used “du -sh *” to see if there were any large files:

root@DBTSVM1:/var/log# du -sh *
0       audit
292K    dnf.librepo.log
740K    dnf.log
24K     dnf.rpm.log
4.0K    hawkey.log-20220213
56K     pcp
0       sa

It was strange I only saw a handful files there: No /var/log/messages file, nor /var/log/secure file. Nothing under /var/log/audit.

xfs_info didn’t show anything wrong with the filesystem:

root@DBTSVM1:/var/log/pcp# xfs_info /dev/sda6
meta-data=/dev/sda6              isize=512    agcount=4, agsize=655360 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@DBTSVM1:/var/log/pcp# xfs_info /dev/sda5
meta-data=/dev/sda5              isize=512    agcount=4, agsize=655360 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

I could not restart the server because the server was in use. I wanted to remount the 2 filesystems and got “target is busy” because various services were using them for logging. But where were logs?

Then I decided to use lsof to see services using two filesystems and it revealed the problem — Bingo!

root@DBTSVM1:/var/log/pcp# lsof /var/log/audit
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
auditd  2150189 root    7w   REG    8,6   701976  131 /var/log/audit/audit.log
root@DBTSVM1:/var/log/pcp# lsof /var/log
COMMAND       PID USER   FD   TYPE DEVICE    SIZE/OFF     NODE NAME
firewalld     892 root    4w   REG    8,5         558      160 /var/log/firewalld (deleted)
VGAuthSer     893 root    2w   REG    8,5      565248      151 /var/log/vmware-vgauthsvc.log.0 (deleted)
VGAuthSer     893 root    4w   REG    8,5      565248      151 /var/log/vmware-vgauthsvc.log.0 (deleted)
vmtoolsd      894 root    4w   REG    8,5      184320      150 /var/log/vmware-vmsvc-root.log (deleted)
rsyslogd     1070 root    9w   REG    8,5   390782976      171 /var/log/messages (deleted)
rsyslogd     1070 root   10w   REG    8,5 10220695552      172 /var/log/secure (deleted)
rsyslogd     1070 root   11w   REG    8,5 10220695552      172 /var/log/secure (deleted)
rsyslogd     1070 root   12w   REG    8,5        8192      170 /var/log/maillog (deleted)
rsyslogd     1070 root   13w   REG    8,5     4452352    15565 /var/log/auth.log (deleted)
rsyslogd     1070 root   22w   REG    8,5      131072      169 /var/log/cron (deleted)
pmcd         1185  pcp  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmcd         1185  pcp    1w   REG    8,5        2552 25165982 /var/log/pcp/pmcd/pmcd.log (deleted)
pmcd         1185  pcp    2w   REG    8,5        2552 25165982 /var/log/pcp/pmcd/pmcd.log (deleted)
pmdaroot     1189 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdaroot     1189 root    2w   REG    8,5         837 25165983 /var/log/pcp/pmcd/root.log (deleted)
pmdaproc     1191 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdaproc     1191 root    2w   REG    8,5        1838 25165984 /var/log/pcp/pmcd/proc.log (deleted)
pmdaxfs      1192 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdaxfs      1192 root    2w   REG    8,5          76 25165985 /var/log/pcp/pmcd/xfs.log (deleted)
pmdalinux    1193 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdalinux    1193 root    2w   REG    8,5          78 25165986 /var/log/pcp/pmcd/linux.log (deleted)
python3      1195 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
python3      1195 root    2w   REG    8,5          82 25165987 /var/log/pcp/pmcd/nfsclient.log (deleted)
pmdakvm      1206 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdakvm      1206 root    2w   REG    8,5         174 25165988 /var/log/pcp/pmcd/kvm.log (deleted)
pmdadm       1215 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
pmdadm       1215 root    2w   REG    8,5        1163 25165989 /var/log/pcp/pmcd/dm.log (deleted)
python3      1216 root  cwd    DIR    8,5           6 25165953 /var/log/pcp/pmcd (deleted)
python3      1216 root    2w   REG    8,5         465 25165990 /var/log/pcp/pmcd/openmetrics.log (deleted)
pmie         1789  pcp  cwd    DIR    8,5           6    15566 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA (deleted)
pmie         1789  pcp    1w   REG    8,5         113    15562 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA/pmie.log (deleted)
pmie         1789  pcp    2w   REG    8,5         113    15562 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA/pmie.log (deleted)
pmie         6346  pcp  cwd    DIR    8,5           6      153 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA (deleted)
pmie         6346  pcp    1w   REG    8,5         102     1929 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA/pmie.log (deleted)
pmie         6346  pcp    2w   REG    8,5         102     1929 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA/pmie.log (deleted)
bash      1951094 root  cwd    DIR    8,5          68 16797824 /var/log/pcp
lsof      2151527 root  cwd    DIR    8,5          68 16797824 /var/log/pcp
lsof      2151528 root  cwd    DIR    8,5          68 16797824 /var/log/pcp

As you can see, lots of files were marked deleted — somebody must have deleted logs manaully because he saw directories were full which unfortunately it was not the right way to release space. When those files are opened by the daemons, even they are deleted, daemons still keep them open and lock the space on the filesystem.

So I just needed to restart those services and get the space freed.

root@DBTSVM1:/var/log/pcp# systemctl restart rsyslog
root@DBTSVM1:/var/log/pcp# systemctl restart firewalld
root@DBTSVM1:/var/log/pcp# systemctl restart vmtoolsd
root@DBTSVM1:/var/log/pcp# systemctl restart pmcd
root@DBTSVM1:/var/log/pcp# systemctl restart pmie

For auditd, it’s a bit different because systemctl would not do it due to dependency.

root@DBTSVM1:/var/log/pcp# systemctl restart auditd
Failed to restart auditd.service: Operation refused, unit auditd.service may be requested by dependency only (it is configured to refuse manual start/stop).
See system logs and 'systemctl status auditd.service' for details.

I had to use the following way to restart it:

root@DBTSVM1:/var/log/pcp# /sbin/service auditd restart
Stopping logging:                                          [  OK  ]
Redirecting start to /bin/systemctl start auditd.service

After restart of services, two filesystems are good and all missing log files are back.

root@DBTSVM1:/var/log/pcp# df -h |grep log
/dev/sda5        10G  107M  9.9G   2% /var/log
/dev/sda6        10G  108M  9.9G   2% /var/log/audit

Note that the proper way to maintain logs and free space is to use log rotate which I have showed in my previous blog.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s