Yesterday I reported 2 further "Confirmed Kills by unresolvable VMFS 6 locks" in Twitter and some folks asked for details.
The first cases I noticed must be about 2 years ago if I remember correctly.
Once I saw the first ones I started to see them more regularly.
Since about a year I have a defintion for this cases and started to count them for my own use.
Some weeks ago I noticed that VMFS 6 used by ESXi 6.7 has at least one new locktype that I never seen before.
VMFS 5 and earlier versions have 2 magic values for locks in the heartbeat section ( part of the hidden system file .vh.sf
When I noticed that new type I tried to get in contact with VMware Inc. via Twitter to ask wether this is a new feature, a bug or just a form of corruption.
One suggestion was to ask in VMTN - funny idea ...
Looks like VMware Inc. is satisfied with the quality-level of VMFS related support in VMTN - which is a bit strange as I have seen no VMware employee with VMFS knowhow visiting VMTN in the last years.
Anyway - you all know how a locked file behaves. You cant copy them, you cant launch a VM that uses them and most important: you cant query the mapping of the file with vmkfstools -p 0.
To make it into the "Confirmed Kills" list the locked file must fail some more tests:
- hexdump -C file fails
- mount vmfs datastore readonly via sshfs and try to use ddrescue against the locked file must fail
- vmfs-fuse must fail
- then I run some more advanced manual tests that try to get the mapping of the file - no use to list them here - they must fail as well.
For a layman there is one test that is easy to use that I also consider to be conclusive:
mount the datastore readonly via sshfs from Linux and then use the best VMFS reading tool on the market and try to open the file or just list the details - if this fails as well I consider the vmdk as a "Confirmed Kill"
FAQ:
What is the best VMFS reading tool available on the market (state oktober 2019) :
UFSexplorer - Linux LiveCD - get the latest version.
Is it possible to extract data from a "Confirmed Kill" ?
- as always with vmdks without usable VMFS-metadata:
thin provisioned = no way,
lazy zeroed = why would some one use lazy zeroed ???,
eager-zeroed = if allocated in one piece data extraction is possible - if the vmdk uses several pieces extraction is still possible but you need highly skilled personnel to even have a chance.
vmfs-sparse = advanced but possible (I managed it in a few rare cases
sesparse = oh dear - from a recovery point of view sesparse is a nightmare - I have no success so far, as far as I know UFSexplorer cant read latest sesparse version either
Do you have a "Confirmed Kills" list for VMFS 5 or 3 ?
Nope - with those version I always find workarounds - so no problem here.
Do you have a therapy yet ?
Yes - I had success in a few cases. I did a complete reset of the heartbeat section and could fix a handful of cases. BUT those were standalone hosts only. I have not seen enough cases with clustered VMFS datastores to suggest this "brute force" approach at the moment.
Do you work on a minimal invasive therapy ?
Of course I do. Progress is slow as the research requires a Lab-environment with at least 2 ESXi, a shared storage and at least one single working VM. Currently I only can run one or two nested ESXis on top of Workstation 12. I cant use running Test VMs at all and so my lab helps a little bit with basic research - but it is not good enough to develope a safe therapy.
Can VMware-support deal with this problem ?
I dont know.
Can Ontrack help ?
I dont know but I assume they can help.
Why is this an anoying problem ?
I expect that I will find a therapy sooner or later that just requires one or two very small hexedits.
It is possible that VMware Inc. already is aware of the problem and maybe they already have a working and safe therapy.
It is possible that a VMFS-developer already has a working therapy and that it is flagged as "unsupported" and so does not need to be documented at all.
Did you open a support ticket with VMware-support ?
Good joke - I am not a paying cuastomer
Did you report the issue in the moderator-forum ?
That forum appears to be abandoned at the moment.
What else did you try ?
I asked for detailed documentation for the .vh.sf with VMTN posts and Twitter.
Did you try emails to a VMFS-engineer ?
No - I requested an email list with engineers, product manager and so on in the moderator-forum but I dont think that the moderators office has been used last couple of months.
Where can I report future "Confirmed Kills" ?
Just add a note here or call me via skype.
Nothing about the voma-tool ?
Correct - I also did not mention snake-oil ...
What about reward points ?
As always: 10 points for 4 random links - make it double if they rhyme
Ulli