Ran into an interesting problem with two XFS volumes full on an Oracle Linux 8 system. I was helping for a database upgrade issue and on-screen messages poped out saying “/var/log/sssd/” missing. Apparently sssd was trying to write its logs and could not access the directory.
I started to look around and saw two partitions (xfs) were full
rroot@DBTSVM1:/var/log# df -h -T
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5 xfs 10G 10G 20K 100% /var/log
/dev/sda6 xfs 10G 10G 51M 100% /var/log/audit
As usual, I used “du -sh *” to see if there were any large files:
root@DBTSVM1:/var/log# du -sh *
0 audit
292K dnf.librepo.log
740K dnf.log
24K dnf.rpm.log
4.0K hawkey.log-20220213
56K pcp
0 sa
It was strange I only saw a handful files there: No /var/log/messages file, nor /var/log/secure file. Nothing under /var/log/audit.
xfs_info didn’t show anything wrong with the filesystem:
root@DBTSVM1:/var/log/pcp# xfs_info /dev/sda6
meta-data=/dev/sda6 isize=512 agcount=4, agsize=655360 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=2621440, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@DBTSVM1:/var/log/pcp# xfs_info /dev/sda5
meta-data=/dev/sda5 isize=512 agcount=4, agsize=655360 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=2621440, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
I could not restart the server because the server was in use. I wanted to remount the 2 filesystems and got “target is busy” because various services were using them for logging. But where were logs?
Then I decided to use lsof to see services using two filesystems and it revealed the problem — Bingo!
root@DBTSVM1:/var/log/pcp# lsof /var/log/audit
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
auditd 2150189 root 7w REG 8,6 701976 131 /var/log/audit/audit.log
root@DBTSVM1:/var/log/pcp# lsof /var/log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
firewalld 892 root 4w REG 8,5 558 160 /var/log/firewalld (deleted)
VGAuthSer 893 root 2w REG 8,5 565248 151 /var/log/vmware-vgauthsvc.log.0 (deleted)
VGAuthSer 893 root 4w REG 8,5 565248 151 /var/log/vmware-vgauthsvc.log.0 (deleted)
vmtoolsd 894 root 4w REG 8,5 184320 150 /var/log/vmware-vmsvc-root.log (deleted)
rsyslogd 1070 root 9w REG 8,5 390782976 171 /var/log/messages (deleted)
rsyslogd 1070 root 10w REG 8,5 10220695552 172 /var/log/secure (deleted)
rsyslogd 1070 root 11w REG 8,5 10220695552 172 /var/log/secure (deleted)
rsyslogd 1070 root 12w REG 8,5 8192 170 /var/log/maillog (deleted)
rsyslogd 1070 root 13w REG 8,5 4452352 15565 /var/log/auth.log (deleted)
rsyslogd 1070 root 22w REG 8,5 131072 169 /var/log/cron (deleted)
pmcd 1185 pcp cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmcd 1185 pcp 1w REG 8,5 2552 25165982 /var/log/pcp/pmcd/pmcd.log (deleted)
pmcd 1185 pcp 2w REG 8,5 2552 25165982 /var/log/pcp/pmcd/pmcd.log (deleted)
pmdaroot 1189 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdaroot 1189 root 2w REG 8,5 837 25165983 /var/log/pcp/pmcd/root.log (deleted)
pmdaproc 1191 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdaproc 1191 root 2w REG 8,5 1838 25165984 /var/log/pcp/pmcd/proc.log (deleted)
pmdaxfs 1192 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdaxfs 1192 root 2w REG 8,5 76 25165985 /var/log/pcp/pmcd/xfs.log (deleted)
pmdalinux 1193 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdalinux 1193 root 2w REG 8,5 78 25165986 /var/log/pcp/pmcd/linux.log (deleted)
python3 1195 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
python3 1195 root 2w REG 8,5 82 25165987 /var/log/pcp/pmcd/nfsclient.log (deleted)
pmdakvm 1206 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdakvm 1206 root 2w REG 8,5 174 25165988 /var/log/pcp/pmcd/kvm.log (deleted)
pmdadm 1215 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
pmdadm 1215 root 2w REG 8,5 1163 25165989 /var/log/pcp/pmcd/dm.log (deleted)
python3 1216 root cwd DIR 8,5 6 25165953 /var/log/pcp/pmcd (deleted)
python3 1216 root 2w REG 8,5 465 25165990 /var/log/pcp/pmcd/openmetrics.log (deleted)
pmie 1789 pcp cwd DIR 8,5 6 15566 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA (deleted)
pmie 1789 pcp 1w REG 8,5 113 15562 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA/pmie.log (deleted)
pmie 1789 pcp 2w REG 8,5 113 15562 /var/log/pcp/pmie/DBTSVM1T.RHA-RRS.CA/pmie.log (deleted)
pmie 6346 pcp cwd DIR 8,5 6 153 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA (deleted)
pmie 6346 pcp 1w REG 8,5 102 1929 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA/pmie.log (deleted)
pmie 6346 pcp 2w REG 8,5 102 1929 /var/log/pcp/pmie/DBTSVM1.RHA-RRS.CA/pmie.log (deleted)
bash 1951094 root cwd DIR 8,5 68 16797824 /var/log/pcp
lsof 2151527 root cwd DIR 8,5 68 16797824 /var/log/pcp
lsof 2151528 root cwd DIR 8,5 68 16797824 /var/log/pcp
As you can see, lots of files were marked deleted — somebody must have deleted logs manaully because he saw directories were full which unfortunately it was not the right way to release space. When those files are opened by the daemons, even they are deleted, daemons still keep them open and lock the space on the filesystem.
So I just needed to restart those services and get the space freed.
root@DBTSVM1:/var/log/pcp# systemctl restart rsyslog
root@DBTSVM1:/var/log/pcp# systemctl restart firewalld
root@DBTSVM1:/var/log/pcp# systemctl restart vmtoolsd
root@DBTSVM1:/var/log/pcp# systemctl restart pmcd
root@DBTSVM1:/var/log/pcp# systemctl restart pmie
For auditd, it’s a bit different because systemctl would not do it due to dependency.
root@DBTSVM1:/var/log/pcp# systemctl restart auditd
Failed to restart auditd.service: Operation refused, unit auditd.service may be requested by dependency only (it is configured to refuse manual start/stop).
See system logs and 'systemctl status auditd.service' for details.
I had to use the following way to restart it:
root@DBTSVM1:/var/log/pcp# /sbin/service auditd restart
Stopping logging: [ OK ]
Redirecting start to /bin/systemctl start auditd.service
After restart of services, two filesystems are good and all missing log files are back.
root@DBTSVM1:/var/log/pcp# df -h |grep log
/dev/sda5 10G 107M 9.9G 2% /var/log
/dev/sda6 10G 108M 9.9G 2% /var/log/audit
Note that the proper way to maintain logs and free space is to use log rotate which I have showed in my previous blog.