We had an issue this week where an ASP.NET application was consuming 100% of CPU on a multi-tenant server. This obviously had a negative effect on all the applications hosted by this server so we dropped everything to research further. In this post, we will review the steps we took to diagnose and research the problem.
Metrics
We found the issue by monitoring our metrics. In this case, we saw a spike in response times for one our applications first.
Memory usage looked stable but CPU had a noticeable spike as well:
This also correlated to our Google Analytics metrics as well:
Now to find the culprit!
Diagnosing
I always try to start diagnosing remotely via PowerShell or the command line when possible to avoid remoting. In this case, I used the Get-WMiObject PowerShell cmdlet to see what processes were using the most CPU, for example:
> gwmi -computer webServer1 Win32_PerfFormattedData_PerfProc_Process | sort PercentProcessorTime -desc | select Name,PercentProcessorTime | Select -First 10 | ft -auto
Name PercentProcessorTime
—- ——————–
_Total 100
Idle 100w3wp#19 72
w3wp#18 0
w3wp#17 0
w3wp#14 0
w3wp#15 0
After running this script multiple times it was clear a single w3wp process was hogging the CPU. At this point I remoted into the box to try to figure out what this application was doing. I used Process Explorer to find the process and digging into its threads. Here is an example screenshot of Process Explorer running on my local machine:
Process Explorer is a super powerful tool in the awesome SysInternal Suite. I found the offending process and opened its properties to dig into its threads:
And further into the stack for one of the high CPU threads:
I would like to say I’m l33t and can read all this but I can’t. The only thing I could glean here is that the .NET CLR in involved which isn’t very helpful.
Next I created a Minidump for the process by right clicking the process from the main process list hoping that I may be able to use other tools to learn more from a mini memory dump:
Shorty after taking the dump the process stopped pegging the CPU. So I barely got it. You should probably do this first while the issue is occurring so you don’t miss it.
Next I used Debug Diagnostic Tool to analyze the minidump:
And ran a Crash/Hang Analyzers analysis to see if it could find anything, here are a few of the results:
So threading is always fun! My first thought was this is probably some internal CLR code that doesn’t really help. After further consideration I checked to see if there were any Threading or Parallel calls in the application code and found one!
Solution?
I’m still working with the team who owns this code to see if we can reproduce and fix the issue so I don’t know what the solution is yet. I still thought sharing this process would be incremental valuable and I can update the post later when we find a fix.
Was this post helpful/valuable? Please leave your comments/feedback/questions in the comments below. It’s always great to hear what readers think of these posts so I can decide what to write about in the future.
Cheers!
Any recommended action to correct this behaviour?