When designing a type (I use the term type rather than class since the discussion here could just as well be applied to value types) they will typically use a field to store state internally. When exposing the state to consumers of the type, you can choose to do so by either making the field itself public or by creating a property through which the field is exposed. There is a third choice as well and that is to create methods that will get or set the value, but I will leave that out of the discussion for now (after all, this is what happens technically when exposing a field through a property).

public class SomeClass
{
    public int SomeValue;
}

The type above is about as simple as it gets. A class exposing a public field. Access to this field will be highly efficient since there is no extra wrapping around it; it is the field itself that is exposed. However, this is often not considered to be a good idea. There are several reasons for this. I will discuss some of them here.

Let's say that we release a class library with the above type in it, allowing others to use it in their code. Later we realize that we need to restrict the possible values that may be assigned to SomeValue. In fact, for some reason the value 42 is not acceptable. In order to enforce this restriction, we change our code so that SomeValue is now a property instead:

public class SomeClass
{
    private int _someValue;
    public int SomeValue
    {
        get { return _someValue; }
        set
        {
            if (value == 42)
            {
                throw new ArgumentException("value must be 42");
            }
            _someValue = value;
        }
    }
}

Bang! We just broke all code that is compiled against our type. Any code using your type must now be recompiled. This is perhaps the strongest argument against exposing fields directly; future-proofing. By wrapping field access into a property, you have created a platform where you can add validation without breaking any consuming code. This is a good thing to do, because we cannot see into the future. We have no idea how business requirements may change next month.

But wait, there is more. Do you use data binding? (I try to stay away from it, but that is just me). If you do, you may want to make sure that your types are data bindable. Currently, data binding works with properties, but not fields. If you choose to expose the field directly, you also choose not to support data binding.

Another nice feature that you get when exposing state through properties is that you can have different access levels on get and set operations:

public class SomeClass
{
    private int _someValue;
    public int SomeValue
    {
        get { return _someValue; }
        protected set
        {
            if (value == 42)
            {
                throw new ArgumentException("value must be 42");
            }
            _someValue = value;
        }
    }
}

Now SomeValue is designed so that any code can get the value, but only code in the SomeType class or any descendant class can set it. This kind of granularity of access control is not available with fields.

Given the arguments above, I would say that it is a good idea to default to exposing values through properties, unless there are very strong reasons not to. Then what about the internal code in the type itself? Should that code be allowed to use the field directly?

My personal, strong opinion is no, also the internal code should use the property (again, unless there are strong reasons not to). The main reason for this is what we can call debuggability: if you assign the value to the field in one, and only one place, you can easily track down what piece of code that is every now and then assigning that weird value that screws up the state of your program. Just set a breakpoint (or write to the log) in the property setter and wait for the offending code to make the call, then examine the stack trace. If you allow code to set the value in the field directly, this operation will be a lot more complicated.

I have stated that you should use the property unless there are strong reasons not to. What exactly do I consider strong reasons. Naturally, that depends on your code and what it does, but one example may be performance: if performance is very crucial, direct field access may be preferable. However: never optimize code for performance until you have measured it. Really. I cannot recall that I have ever come across a real-world case where this performance difference was an issue in any project that I have worked with.

Conclusion

I like to think about the property as a contract and the field as implementation. Any piece of code that needs to get or set the value should use the contract. Relying on the implementation should be done only when it is really needed.

kick it on DotNetKicks.com

Bookmark and Share

Say you have a string with a file name with a relative path, such as “..\..\somedir\somefile.txt”, and you want to resolve the full path to the file given a certain reference point in the file system, how would you go about to solve that. Turns out it is rather simple: System.Uri has relative path resolving powers:

public static string ResolveRelativePath(string referencePath, string relativePath) 
{ 
    Uri uri = new Uri(Path.Combine(referencePath, relativePath)); 
    return Path.GetFullPath(uri.AbsolutePath); 
}

Now you can resolve the path like this:

// path will contain "c:\directory\anothersubdir\somefile.txt"
string path = ResolveRelativePath(@"c:\directory\subdir", @"..\anothersubdir\somefile.txt");

Update: As Morgan points out in the comments, using System.Uri is actually not necessary. The following code will do the same job (but in a more efficient manner):

public static string ResolveRelativePath(string referencePath, string relativePath) 
{ 
    return Path.GetFullPath(Path.Combine(referencePath, relativePath));
}

I shall spend a moment in the corner of shame for not doing that from the start.

kick it on DotNetKicks.com

Bookmark and Share

Nullable<T> vs null

16 Feb 2010

One thing that I sometimes see people wonder about is how Nullable<T> relates to null. This blog post is an attempt to explain how it works behind the scenes.

In order to understand this text, you should have basic understanding about the difference between reference types and value types in .NET, along with understanding about why value types cannot be null, while reference types can. If you don't feel that you do know this, you can read up a bit on that topic here: Parameter passing in C# (by Jon Skeet). You should also have basic knowledge about generics (no advanced stuff; just enough to understand the syntax around them).

Declaring

In C# there are two ways to declare members using Nullable<T> (I am using int for the examples, but it could be any value type):

    private Nullable<int> a = null;
    private int? b = null;

The two code lines above are both declaring the exact same type:  Nullable<int>. The second, shorter declaration is simply syntactic sugar that is translated into Nullable<int> by the C# compiler. There is no type difference of any kind between the two styles (the ultimate proof for this statement is that object.ReferenceEquals(typeof(Nullable<int>), typeof(int?)) returns true), it's just a matter of... well, style. In this blog post I will stick to the longer version of the two for the sake of clarity.

Not all nulls are equal

The type Nullable<int> is in itself a value type. It is a struct, and exhibits the same behavior as other value types. One of the characteristics of a value type is its inability to be null. So, how is it that the above declarations function at all? Should the compiler not issue a compilation error? Why can we suddenly assign null to a variable that is of a value type?

In short; we can't. It's a compiler trick. Let's examine this a bit closer. I will first look at how a reference type behaves:

    private static string GetNullString()
    {
        return null;
    }

A simple method, returning a string, in which the body returns null. The IL code for the method looks like this:

    .method private hidebysig static string  GetNullString() cil managed
    {
      // Code size       2 (0x2)
      .maxstack  8
      IL_0000:  ldnull
      IL_0001:  ret
    } // end of method TestClass::GetNullString

What the above IL code does is to push a null to the stack (ldnull) and then returning it to the caller (ret). Now, let's look at a similar function returning an Nullable<int>:

    private static Nullable<int> GetNullInt()
    {
        return null;
    }

...and the IL code:

    .method private hidebysig static valuetype [mscorlib]System.Nullable`1<int32> 
            GetNullInt() cil managed
    {
      // Code size       10 (0xa)
      .maxstack  1
      .locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> CS$0$0000)
      IL_0000:  ldloca.s   CS$0$0000
      IL_0002:  initobj    valuetype [mscorlib]System.Nullable`1<int32>
      IL_0008:  ldloc.0
      IL_0009:  ret
    } // end of method TestClass::GetNullInt

Notice how the method initializes a new instance of the type Nullable<int> and returns it. There is no trace of null at all. The instance is created using the initobj instruction, which initializes the fields of the type to either a null reference (if the field is of a reference type) or the default value of the type (if the field is of a value type).

When you in your code assign null to a Nullable<int> (or another nullable value type), the compiler will emit IL code that initializes a new instance of Nullable<int> and assign that instead. So, we do in fact get an object instance even though we assign null. This can be verified by running the following code:

    Nullable<int> a = null;
    Console.WriteLine(a.HasValue);  // prints False

Assign null to a variable, and then try to access an member through that variable. This kind of code would typically throw a NullReferenceException, but in the case of the Nullable<T>, it's perfectly legal.

Then what happens then if we assign a value that is not null?

    private static Nullable<int> GetNonNullInt()
    {
        return 5;
    }

The generated IL code:

    .method private hidebysig static valuetype [mscorlib]System.Nullable`1<int32> 
            GetNonNullInt() cil managed
    {
      // Code size       7 (0x7)
      .maxstack  8
      IL_0000:  ldc.i4.5
      IL_0001:  newobj     instance void valuetype [mscorlib]System.Nullable`1<int32>::.ctor(!0)
      IL_0006:  ret
    } // end of method TestClass::GetNonNullInt

In this method the Nullable<int> instance is created in a different manner; first the method pushes an integer of the value 5 onto the stack (ldc.i4.5), then a new object is created and has a constructor invoked. The constructor is taking one parameter of the type !0 (OK, that's a weird type; it's a trick that is used by the JIT compiler when generating the concrete type, but that is outside the scope for this text; for now let's just pretend it says int32). The point here is that in the first case, when we assign null, we get an instance of Nullable<int> that is created in a way where its fields are initialized to their default values for the respective type, while in the case where we assign an integer, the instance is initialized using a constructor to which the value is passed. This difference is one part of the magic of Nullable<T>.

Now we have established that when it comes to assigning null to Nullable<T>, it's all in the compiler. It has knowledge about this specific type, and gives it special treatment. We write Nullable<int> a = null;, but the compiler changes it into Nullable<int> a = new Nullable<int>();

Comparing to null

So, what about comparisons?

   private static void TestForNull()
    {
        Nullable<int> a = null;
        if (a == null)
        {
            Console.WriteLine("a is null");
        }
    }

Didn't we just establish that a in this case is indeed an instance of Nullable<int>. If that is the case, how can a comparison with null evaluate to true? Again, it's a compiler trick. The answer is to be found in the IL code:

    .method private hidebysig static void  TestForNull() cil managed
    {
      // Code size       28 (0x1c)
      .maxstack  1
      .locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> a)
      IL_0000:  ldloca.s   a
      IL_0002:  initobj    valuetype [mscorlib]System.Nullable`1<int32>
      IL_0008:  ldloca.s   a
      IL_000a:  call       instance bool valuetype [mscorlib]System.Nullable`1<int32>::get_HasValue()
      IL_000f:  brtrue.s   IL_001b
      IL_0011:  ldstr      "a is null"
      IL_0016:  call       void [mscorlib]System.Console::WriteLine(string)
      IL_001b:  ret
    } // end of method TestClass::TestForNull

What happens here is that a new Nullable<int> is created using the initobj instruction (lines IL_0000 - IL_0002), then the getter for the HasValue property is invoked, and the result is pushed to the evaluation stack (lines IL_0008 - IL_000a). The brtrue.s instruction will transfer control to the address specified after it (IL_001b) if the current value on the evaluation stack is true. The question then is, what is the current value on the evaluation stack when the brtrue.s instruction is executed? Well, since the Nullable<int> object was created using the initobj instruction, HasValue is false (the default value for a Boolean field). Again, where our code makes a comparison to null, the compiler replaces it with something else. The code above is equivalent to the following:

   private static void TestForNull()
    {
        Nullable<int> a = null;
        if (!a.HasValue)
        {
            Console.WriteLine("a is null");
        }
    }

 Before we wrap up, let's check one last oddity:

    private static void TestForNull()
    {
        Nullable<int> a = null;
        if (a.Equals(null))
        {
            Console.WriteLine("a is null");
        }
    }

...and the IL version:

    .method private hidebysig static void  TestForNull() cil managed
    {
      // Code size       35 (0x23)
      .maxstack  2
      .locals init ([0] valuetype [mscorlib]System.Nullable`1<int32> a)
      IL_0000:  ldloca.s   a
      IL_0002:  initobj    valuetype [mscorlib]System.Nullable`1<int32>
      IL_0008:  ldloca.s   a
      IL_000a:  ldnull
      IL_000b:  constrained. valuetype [mscorlib]System.Nullable`1<int32>
      IL_0011:  callvirt   instance bool [mscorlib]System.Object::Equals(object)
      IL_0016:  brfalse.s  IL_0022
      IL_0018:  ldstr      "a is null"
      IL_001d:  call       void [mscorlib]System.Console::WriteLine(string)
      IL_0022:  ret
    } // end of method TestClass::TestForNull

This method is similar to the previous one we looked at, but instead of invoking the HasValue property getter, the Equals method is invoked. The Equals method is overridden by the Nullable<T> type, a short and simple implementation, looking like this:

    public override bool Equals(object other)
    {
        if (!this.HasValue)
        {
            return (other == null);
        }
        if (other == null)
        {
            return false;
        }
        return this.value.Equals(other);
    }

Or, in plain English: if HasValue of the current instance is false, and other is null, let's say we are equal. Otherwise, if other is null, let's say we are not equal. Otherwise, we call the Equals method of the object in the current instance's value field, passing other to it. This Equals implementation means that Nullable<T> will pretend to be null if HasValue is false.

As a side note it can be noted that this implementation of Equals actually breaks the contract stipulated by the documentation of Object.Equals. There it is noted that (amongst other things) x.Equals(null) must return false.

Conclusion

Bottom line; Nullable<T> allows you to write code as if it could be null, but a combination of overrides and compiler wizardry translates our code into statements dealing with a regular value type instance, behaving like any other value type instance.

kick it on DotNetKicks.com

Bookmark and Share