Ok, sorry for the (bad) pun, but naming is hard. In this post I just wanted to dump some stuff I recently did in an attempt to figure out, were in a couple of dozen lines of code of double-arithmetic a Double.Nan
occurred.
First I thought it would be kind of cool, if the debugger just hit a breakpoint when a floating point value comes out to NaN. Unfortunately there is no such kind of setting (at least not for managed code). However, there is another thing that looks promising:
If the FPU encounters some types of results (NaN, Overflow, Underflow, etc.) it raises a hardware exception. Back in the day those would come up and result in real “exceptions” (or whatever the mechanism of choice of the respective language / platform was). However in most modern environments this no longer happens (the exceptions are disabled) and such conditions are translate into other constructs (like a Double.NaN
in the CLR).
One can tinker with these settings by P/Invoking _controlfp_s()
or some of its variants. However, there are some things to consider:
1.) The occurance of NaN is converted into an exception (but different ones for the 32-bit vs. the 64-bit CLR)
2.) Differences in behavior after an exception is thrown between 32bit (X86) and 64bit (X64) CLR.
3.) Double.NaN
is used quite often inside the BCL itself and the value can occur in lots of places (for example in Hashtable, Dictionary, etc. when calculating the load factor, inside WPF when dealing with coordinates, etc.). This can lead to an annoying number of false positives you’d have to deal with.
Let’s look at an example. The following is just a simple .NET Core 3.1 console application (the full framework CLR behaves the same):
The output for an X64 build is:
BEFORE 1:_EM_INEXACT, _EM_UNDERFLOW, _EM_OVERFLOW, _EM_ZERODIVIDE, _EM_INVALID, _EM_DENOMRAL
Double.IsNan: True
0
BEFORE 2: _EM_DENOMRAL
GOT AN ArithmeticException: Overflow or underflow in the arithmetic operation.
BEFORE 3: _EM_DENOMRAL
GOT AN ArithmeticException: Overflow or underflow in the arithmetic operation.
0
BEFORE 4: _EM_INEXACT, _EM_UNDERFLOW, _EM_OVERFLOW, _EM_ZERODIVIDE, _EM_INVALID, _EM_DENOMRAL
Double.IsNan: True
As you can see the 64 CLR converts NaN into System.ArithmeticException
(albeit with a slightly
confusing or generic message about “overflow or underflow”, which is not the FPU state in this
case.) Note also that the “dummy” throw/catch has no influence on the FPU settings we forced on
using _controlfp_s
.
The X86 build outputs:
BEFORE 1:_EM_INEXACT, _EM_UNDERFLOW, _EM_OVERFLOW, _EM_ZERODIVIDE, _EM_INVALID, _EM_DENOMRAL
Double.IsNan: True
0
BEFORE 2: _EM_DENOMRAL
GOT AN SEHException: External component has thrown an exception.
BEFORE 3: _EM_INEXACT, _EM_UNDERFLOW, _EM_OVERFLOW, _EM_ZERODIVIDE, _EM_INVALID, _EM_DENOMRAL
GOT AN SEHException: External component has thrown an exception.
0
BEFORE 4: _EM_INEXACT, _EM_UNDERFLOW, _EM_OVERFLOW, _EM_ZERODIVIDE, _EM_INVALID, _EM_DENOMRAL
Double.IsNan: True
Note that the kind of exception throw is an System.Runtime.InteropServices.SEHException
,
which gives the whole thing a much more naughty feeling as it typically an indicator for a
much more serious issue. As note that “BEFORE 3” outputs the default FPU settings again,
that is the dummy throw/catch actually resets those settings inside the CLR (see this
comment https://stackoverflow.com/a/25206025/21567 and the respective
(source code)[https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/excep.cpp#L7857]).
This alone makes to whole practice much more unreliable when looking for issues than in the 64-bit build.
All in all, the whole idea seems to be more of an academic exercise and probably not reliable enough to track down bugs. YMMV, of course.
When the FPU encounters a NaN the status of some registers will change. In general, you should watch the MXCSR
register.
To do so, open from “Debug / Windows” the “Registers” window, and make sure you select
“SSE” from the context menu.
When starting and hitting a breakpoint the registers window should look something like this:
XMM0 = 0000000000000000-000000008C9F9CE8
[...]
XMM15 = 0000000000000000-0000000000000000
MXCSR = 00001FA0
Note the initial values of MXCSR
as 0x00001FA0
. Now single step the region of code you suspect contains or produces the NaN value in question and watch those to registers.
When you encounter the respective statement, their values change to MXCSR
0x00001FA1
.
The MXCSR
register is only the place to look for .NET Core applications (32- and 64-bit) or 64-bit .NET full framework applications.
When using 32-bit full .NET framework applications, the JIT compiler being used is not RyuJIT, but the legacy JIT.
That one doesn’t use the SSE-instructions for floating point, but still the x87 FPU stack (see https://github.com/dotnet/roslyn/issues/7333#issuecomment-560197038),
thus one would need to rather look at the STAT
register (initial value 0x4020
, NaN value 0x4021
; select “Floating Point”
from the Registers window context menu to display the register set for the x86 FPU stack).
This really does work reliably, but requires you to able to narrow down the location at least to some extend.