One thing that I sometimes see people wonder about is how Nullable<T> relates to null. This blog post is an attempt to explain how it works behind the scenes.
In order to understand this text, you should have basic understanding about the difference between reference types and value types in .NET, along with understanding about why value types cannot be null, while reference types can. If you don't feel that you do know this, you can read up a bit on that topic here: Parameter passing in C# (by Jon Skeet). You should also have basic knowledge about generics (no advanced stuff; just enough to understand the syntax around them).
Declaring
In C# there are two ways to declare members using Nullable<T> (I am using int for the examples, but it could be any value type):
private Nullable<int> a = null;
private int? b = null;
The two code lines above are both declaring the exact same type: Nullable<int>.
The second, shorter declaration is simply syntactic sugar that is translated into
Nullable<int> by the C# compiler. There is no type difference
of any kind between the two styles (the ultimate proof for this statement is that
object.ReferenceEquals(typeof(Nullable<int>),
typeof(int?)) returns true), it's just a matter of... well,
style. In this blog post I will stick to the longer version of the two for the sake
of clarity.
Not all nulls are equal
The type Nullable<int> is in itself a value type. It is a struct,
and exhibits the same behavior as other value types. One of the characteristics
of a value type is its inability to be null. So, how is it that the
above declarations function at all? Should the compiler not issue a compilation
error? Why can we suddenly assign null to a variable that is of a value
type?
In short; we can't. It's a compiler trick. Let's examine this a bit closer. I will
first look at how a reference type behaves:
private static string GetNullString()
{
return null;
}
A simple method, returning a string, in which the body returns null.
The IL code for the method looks like this:
.method private hidebysig static string GetNullString() cil managed
{
// Code size 2 (0x2)
.maxstack 8
IL_0000: ldnull
IL_0001: ret
} // end of method TestClass::GetNullString
What the above IL code does is to push a null to the stack (ldnull)
and then returning it to the caller (ret). Now, let's look at a similar
function returning an Nullable<int>:
private static Nullable<int> GetNullInt()
{
return null;
}
...and the IL code:
.method private hidebysig static valuetype [mscorlib]System.Nullable`1<int32>
GetNullInt() cil managed
{
// Code size 10 (0xa)
.maxstack 1
.locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> CS$0$0000)
IL_0000: ldloca.s CS$0$0000
IL_0002: initobj valuetype [mscorlib]System.Nullable`1<int32>
IL_0008: ldloc.0
IL_0009: ret
} // end of method TestClass::GetNullInt
Notice how the method initializes a new instance of the type Nullable<int>
and returns it. There is no trace of null at all. The instance
is created using the initobj instruction, which initializes the fields
of the type to either a null reference (if the field is of a reference type) or
the default value of the type (if the field is of a value type).
When you in your code assign null to a Nullable<int>
(or another nullable value type), the compiler will emit IL code that initializes
a new instance of Nullable<int> and assign that instead. So,
we do in fact get an object instance even though we assign null. This
can be verified by running the following code:
Nullable<int> a = null;
Console.WriteLine(a.HasValue); // prints False
Assign null to a variable, and then try to access an member through
that variable. This kind of code would typically throw a NullReferenceException,
but in the case of the Nullable<T>, it's perfectly legal.
Then what happens then if we assign a value that is not null?
private static Nullable<int> GetNonNullInt()
{
return 5;
}
The generated IL code:
.method private hidebysig static valuetype [mscorlib]System.Nullable`1<int32>
GetNonNullInt() cil managed
{
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldc.i4.5
IL_0001: newobj instance void valuetype [mscorlib]System.Nullable`1<int32>::.ctor(!0)
IL_0006: ret
} // end of method TestClass::GetNonNullInt
In this method the Nullable<int> instance is created in a different
manner; first the method pushes an integer of the value 5 onto the stack (ldc.i4.5),
then a new object is created and has a constructor invoked. The constructor is taking
one parameter of the type !0 (OK, that's a weird type; it's a trick
that is used by the JIT compiler when generating the concrete type, but that is
outside the scope for this text; for now let's just pretend it says int32).
The point here is that in the first case, when we assign null, we get
an instance of Nullable<int> that is created in a way where its
fields are initialized to their default values for the respective type,
while in the case where we assign an integer, the instance is initialized using a
constructor to which the value is passed. This difference is one part of
the magic of Nullable<T>.
Now we have established that when it comes to assigning null to Nullable<T>,
it's all in the compiler. It has knowledge about this specific type, and gives
it special treatment. We write Nullable<int> a = null;, but the
compiler changes it into Nullable<int> a = new Nullable<int>();
Comparing to null
So, what about comparisons?
private static void TestForNull()
{
Nullable<int> a = null;
if (a == null)
{
Console.WriteLine("a is null");
}
}
Didn't we just establish that a in this case is indeed an instance
of Nullable<int>. If that is the case, how can a comparison with
null evaluate to true? Again, it's a compiler trick. The
answer is to be found in the IL code:
.method private hidebysig static void TestForNull() cil managed
{
// Code size 28 (0x1c)
.maxstack 1
.locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> a)
IL_0000: ldloca.s a
IL_0002: initobj valuetype [mscorlib]System.Nullable`1<int32>
IL_0008: ldloca.s a
IL_000a: call instance bool valuetype [mscorlib]System.Nullable`1<int32>::get_HasValue()
IL_000f: brtrue.s IL_001b
IL_0011: ldstr "a is null"
IL_0016: call void [mscorlib]System.Console::WriteLine(string)
IL_001b: ret
} // end of method TestClass::TestForNull
What happens here is that a new Nullable<int> is created using
the initobj instruction (lines IL_0000 - IL_0002), then the getter
for the HasValue property is invoked, and the result is pushed to the
evaluation stack (lines IL_0008 - IL_000a). The brtrue.s instruction
will transfer control to the address specified after it (IL_001b) if the current
value on the evaluation stack is true. The question then is, what is
the current value on the evaluation stack when the brtrue.s instruction
is executed? Well, since the Nullable<int> object was created
using the initobj instruction, HasValue is false
(the default value for a Boolean field). Again, where our code makes
a comparison to null, the compiler replaces it with something else.
The code above is equivalent to the following:
private static void TestForNull()
{
Nullable<int> a = null;
if (!a.HasValue)
{
Console.WriteLine("a is null");
}
}
Before we wrap up, let's check one last oddity:
private static void TestForNull()
{
Nullable<int> a = null;
if (a.Equals(null))
{
Console.WriteLine("a is null");
}
}
...and the IL version:
.method private hidebysig static void TestForNull() cil managed
{
// Code size 35 (0x23)
.maxstack 2
.locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> a)
IL_0000: ldloca.s a
IL_0002: initobj valuetype [mscorlib]System.Nullable`1<int32>
IL_0008: ldloca.s a
IL_000a: ldnull
IL_000b: constrained. valuetype [mscorlib]System.Nullable`1<int32>
IL_0011: callvirt instance bool [mscorlib]System.Object::Equals(object)
IL_0016: brfalse.s IL_0022
IL_0018: ldstr "a is null"
IL_001d: call void [mscorlib]System.Console::WriteLine(string)
IL_0022: ret
} // end of method TestClass::TestForNull
This method is similar to the previous one we looked at, but instead of invoking
the HasValue property getter, the Equals method is invoked.
The Equals method is overridden by the Nullable<T>
type, a short and simple implementation, looking like this:
public override bool Equals(object other)
{
if (!this.HasValue)
{
return (other == null);
}
if (other == null)
{
return false;
}
return this.value.Equals(other);
}
Or, in plain English: if HasValue of the current instance is false,
and other is null, let's say we are equal. Otherwise, if other
is null, let's say we are not equal. Otherwise, we call the Equals
method of the object in the current instance's value field, passing
other to it. This Equals implementation means that Nullable<T>
will pretend to be null if HasValue is false.
As a side note it can be noted that this implementation of Equals actually breaks
the contract stipulated by the documentation of
Object.Equals. There it is noted that (amongst other things)
x.Equals(null) must return false.
Conclusion
Bottom line; Nullable<T> allows you to write code as if it could
be null, but a combination of overrides and compiler wizardry translates our code
into statements dealing with a regular value type instance, behaving like any other
value type instance.
