After our merry diversion into Explicit Upcasting corner-cases we now return to make a few more observations about Upcasting in general, and how it is implemented behind the scenes.
In C#, when we create an instance of a class, it’s data is stored in memory, and the variable instance (object) references this memory. This is deliberately vague – it tells us roughly what is going on without the how, and is distinct from some lower level languages whose terminology and constructs give us some indication of the how.
Languages such as C or C++ make use of pointers which are numeric values that ‘point’ to a memory addresses. A pointer is much like a firearm – it has great power, but has the ability to cause great harm (and if it doesn’t solve your problem you need to use some more(*)).
In C#, this is abstracted away somewhat from the developer, so we are deliberately removed from some of this peril. We talk about references and memory rather than pointers and addresses. Eric Lippert, principal developer on the C# compiler team prefers to think of references as “Opaque handles that are meaningful only to the garbage collector” (link).
And, keeping true to this sentiment, in the formal specification for the .NET Common Language Infrastructure (CLI), we are told what we must do, not how to do it. Refer to page 132 of the formal CLI standard (ECMA-335) for the exact text:
Objects of instantiated types shall carry sufficient information to recover at runtime their exact type (including the types and number of their generic arguments). [Rationale: This is required to correctly implement casting and instance-of testing, as well as in reflection capabilities]
Awwwww, it even mentions casting!
So, we can build up a picture of what’s going on. We have two pieces of information:
- The object in memory with its data and meta-data.
- A small bit of information that tells the CLI/CLR(**) where this data starts.
Here’s how Microsoft picture it (link):
So, our reference knows where is it, and our memory store knows what it is. So what happens when we do something like:
Lion l = new Lion(); Feline f = l; //Upcast lion to a Feline
The underlying object is a Lion, however we store it as instance of Feline. The reference is just a hex value, and the underlying object data has not changed, so meta-data still denotes an instance of type Lion.
But where is this information about it being a Feline stored? How does the CLR know it is a Feline?
The answer is that it isn’t needed, and that the CLR doesn’t have to know about it. This is information is only the available to the C# compiler via the conventions in your code and the language rules. The compiler spots an upcast, deduces that it is legal, and generates the associated Intermediate Language code (IL) for the CLR to execute. By the time it reaches the IL, it doesn’t matter, and because the vast overwhelming majority of code will be emitted by a compiler, we don’t need this at a lower level.
There are plenty of places where the compiler doesn’t enforce type safety quite as rigorously, and that we emit type-unsafe IL. Take the following example:
Lion la = new Lion; la = new Tiger(); //compile time failure Feline fa = la; fa = new Tiger(); //compile time success!
In the first instance, the compiler recognises that we have an array of Lions so prevents us from putting a Tiger in it. However, in the second instance we assign our array of Lions to an array of Felines, and then we are allowed to put a Tiger in because Tiger derives from Feline! It’s the same collection underneath, but the compiler now lets us do something illegal.
We only discover something is wrong when we run it, and get an ArrayTypeMismatch exception:
Attempted to access an element as a type incompatible with the array.
Above, when I described the assignment, I deliberately avoided using the word cast. Although this may look like casting, it’s actually covariance. We’ll cover this and its counterpart Contravariance in later posts.
So, we’ve reached the end of your Upcasting journey. Next up, we look at Downcasting and reflect upon the whole series on casting before turning our attention to the world of Variance, Covariance and Contravariance.
(*) If you’re in any doubt: I’m being flippant.
(**) The CLR (Common Language Runtime) is Microsoft’s implementation of the formal CLI specification. The CLR belongs to Microsoft, whereas the CLI belongs to the world.