Two months back we purchased three completely the same big blue servers, and we started to experience issue with two of them. The servers are turning off on random, meaning they can work for a week or two and then they will turn off. When we dug into the logs we saw that the servers are detecting too high temperature and IMM would initiate turn off immediately.
So there is no alert that temperature is going up or anything else the server just turns off like somebody unplugged the cable.
Of course when you unplug the cable on the server that can cause a various case of corruptions, from operating system to the hypervisor, disks etc. In our case the corruption was manifested in the config file so the server would not start stating it is in saved-critical state or Incomplete VM Configuration in the System Center Virtual Machine Manager. The state simply means config file is corrupted and Hyper-V manager or SCVMM are not able to read from it.
The solution I found online is to recreate machine, so basically delete a machine in Hyper-V manager, create new machine and attach to existing vhd or vhdx file. While this will work I think it is very time consuming. It will work for few machines, but imagine what you will do next 5 days if you have hundreds of test machines like we do J
The problem with recreating the machine is it will create a brand new Network adapters what will cause old adapters to hide and servers will acquire new IP address. This involves digging into HKLM\SYSTEM\ControlSet001\services\Tcpip\Parameters\Interfaces\ finding old IP address and then assigning old IP address, or you can see on the what is the old IP address therefore assign the old IP to the new network adapter. Too much time consuming.
What we did? As VMM stated the configuration was incomplete we further investigated the config files on the Hyper-V hosts. There is no way you can do anything with the VMM so abandon it and connect directly to the Hyper-V host.
In Hyper-V manager you will see a big mess like this
So let’s say machine we will going to fix is WS2008R2-001.test.local. You will go into the configuration folder for the virtual machines, somebody keeps it on the C drive some folks keep it together with virtual Machines, you have various examples.
The config files you are looking if you leave VMM to create config files will look like this.
You get that right, it is a big mess and If you have hundreds of virtual Machines you need some kind of text search application. In our case we use good old Total Commander. So search all this files for WS2008R2-001.test.local server in our example.
After you find a file open it in something like Notepad ++ and scroll to the bottom.
This is the end of the file in our environment:
So obviously you see the configuration is corrupted because it is not complete and Hyper-V console and VMM is not able to read from it. What we did is open a healthy virtual machine and find tag <count_per_node>.
In the healthy file you see something like.
So basically we need to finish our corrupted file. What we did is simply copy everything below <count_per_node> and paste instead od old tag <count_per_node> and VOILA machine config is no long corrupted and it is visible again in the VMM or the Hyper-V manager because config file is complete.
What you need to look for in your cases is other tags like Count_per_node, Node_per_scoket, Stopped_at_host_shutdown etc like you see in the screenshot. All this are default values, but if you need to change this to something else then do that accordingly, but this is fine tuning and for the most environments you don’t need to touch this values.
8 Responses
it doesn’t make any sense for readers as to what the solution is and what made it working, there’s no mention on SCVMM or Hyper-V versions.
Here’s our situation: We are on SCVMM 2012 R2 and got the similar situation where the VM is showing as incomplete VM configuration as part of migration from 2008 to 2012 Hyper-V Hosts and its appearing in 2012 Hyper-V Hosts but not appearing in SCVMM on the destination Hyper-v 2012 R2 Hosts, rather its still appearing on the old hosts only in SCVMM view. At the same time its also appearing on Hyper-V 2012 R2 Hosts and the related failover.
I just had this had this happen to me last night on HyperV 2012r2. Interesting that the file was truncated in the exact same spot as the example above. I reviewed the settings from a working VM’s XML file and copied the missing data over and the VM started right up. I did have to restart the HyperV services though.
Hi,
Thank you very much for this article. This surely saved my bacon today. As per normal, finance in the company decided that it was not critical to purchase a backup device thus no backup of the server.
After executing the procedure, the server recovered and started up without a problem
This article saved us last night. After rebooting our Hyper-V server, all our virtual machines started up except for our Domain Controller. Found that the configuration file was missing some lines at the very bottom and copy/pasted them in from another working vm’s configuration file. Our Domain Controller booted up after that. Thanks so much :)
Thank you it resolve my critical problem.
This helped lead me to my solution so thanks to all! It helps when you see everything someone else has already gone through. My solution was edit the same file and at the very last line it was followed by a bunch of [NULL][NULL][NULL]. Once i deleted all of those of the last line and saved with Notepad++ it just worked. I was able to power on the machine. Many Thanks All! ;)
I had exactly the same problem on Hyper-V Windows Server 2012R2. I most probably happened, when the server was not able to shut down (perhaps one Win10 VM caused it), so I needed to reset it.
Your article solved my problem, but after the changes, you must reset the host server.