VM (vmware) hanging after delete snapshots

It’s a best practice to take a snapshot when doing an upgrade for either OS or application running on a virtual machine. After the upgrade is done and the system/application has been verified running fine for a few days, the snapshot needs to be deleted to release space.

We had a VM running an ESXi 7. A snaptshot was taken and then deleted after 3 days. However the VM was hanging and it would not start after a reboot.

We could see one of datastores was full that’s because we were copying a large amount of data over from another system to one of disks on this VM. That disk is the only hard drive allocated from this datastore.

When we looked around within vSphere web GUI, we didn’t see any snapshots listed. However, we did see the hard drive having the disk file name like “xxxxx-000002.vmdk” in the settings of the VM. And when checking the datastore, we saw two vmdk listed, the big one has the size of 2T but not used. The small one with the name “xxxxx-000002.vmdk” has the size of 300G.

Logged into the ESXi console, used “df -h” to check and then went into the volume path for that datastore. We saw 4 vmdk listed, two samll ones which were just description files which were pointing to the other two vmdk files. As to the two large vmdk ones, one of them had the name like “xxxxx-000002-sesparse.vmdk“.

So bascially when creating a snapshot, it consists of files that are stored on a supported storage device. A Take Snapshot operation creates .vmdk, -delta.vmdk, .vmsd, and .vmsn files. By default, the first and all delta disks are stored with the base .vmdk file. The .vmsd and .vmsn files are stored in the virtual machine directory.

Delta disk files
A .vmdk file to which the guest operating system can write. The delta disk represents the difference between the current state of the virtual disk and the state that existed at the time that the previous snapshot was taken. When you take a snapshot, the state of the virtual disk is preserved, the guest operating system stops writing to it, and a delta or child disk is created.
A delta disk has two files. One is a small descriptor file that contains information about the virtual disk, such as geometry and child-parent relationship information. The other one is a corresponding file that contains the raw data.

The files that make up the delta disk are called child disks or redo logs. The number within the vmdk file name is actually the snapshot number.

SEsparse is a snapshot format introduced in vSphere 5.5 for large disks (virtual disks >2TB), and is the preferred format for all snapshots in vSphere 6.5 and above with VMFS-6.

Flat file
A -flat.vmdk file that is one of two files that comprises the base disk. The flat disk contains the raw data for the base disk. This file does not appear as a separate file in the Datastore Browser.

Database file
A .vmsd file that contains the virtual machine’s snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot.

Memory file
A .vmsn file that includes the active state of the virtual machine.

Deleting a snapshot permanently removes the snapshot from the snapshot tree. The snapshot files are consolidated and written to the parent snapshot disk and merge with the virtual machine base disk.

Deleting a snapshot does not change the virtual machine or other snapshots. Deleting a snapshot consolidates the changes between snapshots and previous disk states. Then it writes all the data from the delta disk that contains the information about the deleted snapshot to the parent disk. When you delete the base parent snapshot, all changes merge with the base virtual machine disk.

To delete a snapshot, a large amount of information must be read and written to a disk. This process can reduce the virtual machine performance until the consolidation is complete. Consolidating snapshots removes redundant disks, which improves the virtual machine performance and saves storage space. The time to delete snapshots and consolidate the snapshot files depends on the amount of data that the guest operating system writes to the virtual disks after you take the last snapshot. If the virtual machine is powered on, the required time is proportional to the amount of data the virtual machine is writing during consolidation.

Failure of disk consolidation can reduce the performance of virtual machines. You can check whether any virtual machines require separate consolidation operations by viewing “Monitor” -> “All issues”.

Back to our issue, somehow consolidate didn’t happen after the deletion of the snapshot operation. That’s why we didn’t see the snapshot listed in vSphere web client, but the snapshot disks were still being used. We did a consolidate within the vSphere web client and all snapshot delta disks were removed — SEsparse disks were gone. All disks of VM are using the base vmdk disks now and the problem was solved.

A side note:

You can set a virtual disk to independent mode to exclude the disk from any snapshots taken of its virtual machine. An independent disk does not participate in virtual machine snapshots. That is, the disk state will be independent of the snapshot state and creating, consolidating, or reverting to snapshots will have no effect on the disk. It has two sub-modes: Persisten & Nonpersistent. The “Independent – Nonpersistent” mode will make a disk behave like a read-only disk. Changes to disks in nonpersistent mode are discarded when you power off or reset the virtual machine.

References:

  1. Determining if a virtual machine running on a snapshot (1004343)
  2. Consolidating/Committing snapshots in VMware ESXi (1002310)
  3. How to monitor snapshot deletion using the vim-cmd command (2146185)
  4. Identifying disks when working with VMware ESXi (1014953)
  5. Virtual Machines running on an SEsparse snapshot may report guest data inconsistencies (59216)
  6. Snapshot Manager fails to detect snapshots of VMs (1026380)
  7. Snapshot removal task stops at 99% (1007566)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s