Technology Troubleshooting Virtualization VMware

Inconsistencies between VOMA versions on vSphere 6.5 and vSphere 6.7 – What version to trust?

Experiencing metadata corruptions in VMFS volumes is a troubleshooting scenario that consumes a lot of time specially if you can’t access some VMs files anymore and your applications are experiencing outage. Hopefully you have your backups up to date and tested to restore your machines but in some cases you may need to recover those machines to keep the business running.

vSphere 6.5 has vSphere On-disk Metadata Analyzer (VOMA) version 0.7 this version does not fully support VMFS6 but allows checking VMFS6 volumes for errors although is unable to fix them.
vSphere 6.7 has VOMA version 0.8 and includes support for VMFS6 to check and fix errors.

Now let’s imagine you have VMFS6 volumes shared between both vSphere versions and your ESXi 6.5 reports a high count of disk errors when you check your VMFS6 while your ESXi 6.7 reports only one error.
That difference on error counts between each VOMA version is suspicious and alarming.
The good side is this happens one in lifetime or never.
Some examples of when a file system metadata check is necessary:

  • SAN outage
  • Rebuilt RAID
  • Disk replacement
  • Partition table update
  • Reports of metadata errors in the vmkernel.log file
  • Unable to access files on the VMFS volume that are not in use by any other host

You could be experiencing some of these symptoms after a hardware failure or outage.

  • You have problems accessing certain files on a VMFS datastore.
  • You cannot modify or erase files on a VMFS datastore.
  • Attempting to read files on a VMFS datastore may fail with the error:
    invalid argument 
  • In the /var/log/vmkernel file, you see entries similar to:

    Failed to open swap file ‘/volumes/4730e995-faa64138-6e6f-001a640a8998/mule/mule-560e1410.vswp’: Invalid metadata

    Volume 50fd60a3-3aae1ae2-3347-0017a4770402 (“<Datastore_name>”) may be damaged on disk. Corrupt heartbeat detected at offset 3305472: [HB state 0 offset 6052837899185946624 gen 15439450 stampUS 5 $

So the question is… What version of VOMA is accurate? Can you trust the results?
Here is what we found and actions taken after running both versions of VOMA to reduce the risks of loosing more VMs by metadata corruption.

Checking ESXi version, VOMA version and help

VMware ESXi 6.7.0

[root@bakingcloudshost01:~] vmware -v
VMware ESXi 6.7.0 build-13981272
[root@bakingcloudshost01:~] voma -v
voma version 0.8

[root@bakingcloudshost01:~] voma -h
Usage:
voma [OPTIONS] -m module -d device
 -m, --module      Name of the module to run.
                    Available Modules are
                      1. lvm
                      2. vmfs
                      3. ptbl
 -f, --func        Function(s) to be done by the module.
                     Options are
                       query   - list functions supported by module
                       check   - check for Errors
                       fix     - check & fix
                       dump    - collect metadata dump
 -a, --affinityChk Include affinity related check/fix for VMFS6
 -d, --device      Device/Disk to be used
 -s, --logfile     Path to file, redirects the output to given file
 -x, --extractDump Extract the dump collected using VOMA
 -D, --dumpfile    Dump file to save the metadata dump collected
 -v, --version     Prints voma version and exit.
 -h, --help        Print this help message.
Example:
voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxx:x
voma -m vmfs -f dump -d /vmfs/devices/disks/naa.xxxx:x -D dumpfilename

VMware ESXi 6.5.0

[root@bakingcloudshost02:~] vmware -v
VMware ESXi 6.5.0 build-13932383
[root@bakingcloudshost02:~] voma -v
voma version 0.7

[root@bakingcloudshost02:~] voma -h
Usage:
voma [OPTIONS] -m module -d device
 -m, --module      Name of the module to run.
                    Available Modules are
                      1. lvm
                      2. vmfs
                      3. ptbl
 -f, --func        Function(s) to be done by the module.
                     Options are
                       query   - list functions supported by module
                       check   - check for Errors
                       fix     - check & fix
                       dump    - collect metadata dump
 -d, --device      Device/Disk to be used
 -s, --logfile     Path to file, redirects the output to given file
 -x, --extractDump Extract the dump collected using VOMA
 -D, --dumpfile    Dump file to save the metadata dump collected
 -v, --version     Prints voma version and exit.
 -h, --help        Print this help message.
Example:
voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxx:x
voma -m vmfs -f dump -d /vmfs/devices/disks/naa.xxxx:x -D dumpfilename

Checking VMFS6 for errors with each VOMA version

[root@bakingcloudshost02:~] voma -v
voma version 0.7
[root@bakingcloudshost02:/var/log] voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxxxx0000b8

ON-DISK ERROR: LFB inconsistency found: (601,9) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,10) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,11) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,12) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,13) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,14) allocated in bitmap, but never used

 ON-DISK ERROR: LFB inconsistency found: (601,15) allocated in bitmap, but never used

 ON-DISK ERROR: FB inconsistency found: (165, 0) free'ed in bitmap, but used


Total Errors Found:      4737
[root@bakingcloudshost01:~] voma -v
voma version 0.8

[root@bakingcloudshost01:/var/log] voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxxxx0000b8

Running VMFS Checker version 2.1 in check mode               
Initializing LVM metadata, Basic Checks will be done         
                                                             
Checking for filesystem activity
Performing filesystem liveness check..-Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
Phase 1: Checking VMFS header and resource files             
   Detected VMFS-6 file system (labeled:'xxxxx') with UUID:xxxxxxx, Version 6:81
Phase 2: Checking VMFS heartbeat region
Marking Journal addr (3, 1) in use                           
Phase 3: Checking all file descriptors.                      
Found stale lock [type 10c00001 offset 227270656 v 53, hb offset 4030464
	 gen 5, mode 1, owner 5c47a588-871e914c-af0e-848f6980350e mtime 7484727
	 num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00001 offset 229179392 v 14, hb offset 3440640
	 gen 5, mode 1, owner 5d09a1a3-3f87e418-5976-c81f6664941b mtime 5779258
	 num 0 gblnum 0 gblgen 0 gblbrk 0]
Phase 4: Checking pathname and connectivity.                 
Phase 5: Checking resource reference counts.                 
                                                             
Total Errors Found:           1

Fixing VMFS6 errors

[root@bakingcloudshost01:/var/log] voma -m vmfs -f advfix -d /vmfs/devices/disks/naa.xxxxxx0000b8:1 -p /temp


########################################################################
#   Warning !!!                                                        #
#                                                                      #
#   You are about to execute VOMA in Advanced Fix mode. This mode can  #
#   potentially end up in deleting some files/vmdks from datastore     #
#   if it is heavily corrupted.                                        #
#                                                                      #
#   This mode is suppored only under VMware Support supervision.       #
########################################################################

VMware ESX Question:
Do you want to continue (Y/N)?

0) _Yes
1) _No

Select a number from 0-1: 0

   Successfully created patch file /tmp/naa.xxxxxx0000b8:1_Thu_Sep__5_23:49:40_2019
Running VMFS Checker version 2.1 in advfix mode
Initializing LVM metadata, Basic Checks will be done

Total Errors Found:           1
Total Errors Fixed:           1
Total Partially Fixed errors: 0

Outcome shows found one error and fixed one error.
After fixing the error VOMA was executed again in both ESXi with ESXi6.7 reporting no errors and ESXi6.5 reporting the same 4737 errors.

You have still problems accessing certain files on a VMFS datastore after fixing VMFS error with VOMA.

Results are inconsistent between ESXi versions with a hight count of errors and unable to recover access to files.

If you experience this the recommendation is to create new VMFS volumes and migrate all your machines and files without outage needed.

After completing the migration try to recover the needed files by unmapping volumes from all hosts, then map the volume to one ESXi6.7 host and run the VOMA check and fix again.
I have seen that after 24 hours of having the volumes unmapped from host once is mounted back to hosts the error count on both ESXi versions is 0 although does not warranty files will be accesible.

Related articles to read about VOMA

Checking Metadata Consistency with VOMA
VMFS Lock Volume is Corrupted
Using vSphere On-disk Metadata Analyzer (VOMA) to check VMFS metadata consistency
Data recovery services for data not recoverable by VMware Technical Support

Leave a Reply