Managed vs. unmanaged code

The last couple of days I have done some research regarding destructors in C#. To better understand those, I first needed to properly understand the memory model basics of both managed and unmanaged languages. My findings are summarised in this post.

Microsoft introduced the term “managed code” somewhere around the turn of the century. It refers in particular to C# and Visual Basic .NET code that is first compiled to Intermediary Language (IL) and later executed by the Common Language Runtime (CLR). This CLR compiles the IL to actual machine code in a Just In Time (JIT) fashion. IL is comparable to bytecode in the Java world, and the CLR performs many of the same functions as the Java Virtual Machine (JVM). It manages the application in the sense that it provides support for aspects such as memory management, garbage collection, type safety, thread management, security and exception handling. In other words, Java could also be considered a managed language in Microsoft lingo. Unmanaged code in contrast refers to code that does not depend on such intermediary infrastructure, and is often directly compiled to machine code. C and C++ are good examples.

Our goal today is to better understand the memory management aspect the CLR provides for managed languages. To accomplish this, it helps to first understand how memory is handled in unmanaged imperative languages.

Unmanaged memory model

In this section, The C and C++ languages will be used as case studies for unmanaged memory models. If the developer knows the size and type of data to store at compile time, a simple variable declaration such as int x; or char array[10]; is enough to allocate the appropriate amount of memory. If the variable is declared static, it will be stored in memory along with code so that it is available during the whole lifetime of the process. Else, it is saved on the stack and automatically discarded when it goes out of scope. For a variable with other lifetime requirements, or where the type or size is unknown at compile time, dynamic memory allocation must be used. In that case, the developer must explicitly request and free memory as needed.

There is only one idiomatic way of dynamically allocating memory in C. The two most relevant functions are void *malloc(size_t size) to allocate memory on the heap and void free(void *ptr) to free it again after use. Memory that is not freed explicitly, will be recuperated by the operating system only after the program exits. For long running programs, this can pose problems if the developer forgets to call free. The result is an ever increasing memory footprint, better known as a memory leak. After a while, the operating system will not be able to assign any more memory blocks to the process. In such cases, malloc returns the null reference. If the process does not handle this case properly, it might crash.

Avoiding memory leaks in C places a big cognitive burden on the developer. C++ improves upon the design of C by introducing classes, constructors and destructors. Variables of a given type can be allocated on the stack with the familiar C syntax, e.g. Person person("John Doe");. In this example, the Person(string) constructor will be called automatically immediately after memory allocation. When the variable goes out of scope, the destructor – if defined – is called automatically to clean up any remaining resources. To allocate memory dynamically, the new and delete operators have been introduced to replace malloc and free. To create a person on the “free store” (typically the heap), use Person *person = new Person("John Doe");. When the person data is no longer required, use delete person; to call the destructor (if any) and free the corresponding memory. The new and delete operators have some advantages over malloc and free:

  • memory size is calculated automatically based on the requested type
  • a constructor is called automatically after allocation
  • a typed pointer in returned instead of a void pointer
  • if memory cannot be allocated, an exception is thrown
  • the destructor (if any) is called automatically before deallocation

A common C++ idiom to simplify resource management is “RAII”: Resource Acquisition Is Initialisation. Any resource that needs to be acquired and later disposed of should be wrapped in an object where the constructor acquires said resource and the destructor disposes of it. Preferably, this wrapper object is allocated on the stack so that its life cycle is easily understood. Smart pointers are a useful alternative.

Manual memory management is becoming a thing of the past for most line of business applications. In low memory environments such as in embedded device programming it remain very popular. High performance applications also tend to prefer manual memory management because allocations and deallocations happen in a deterministic fashion without any additional processing overhead (in terms of both time and memory).

Managed memory model with garbage collection

The managed memory model stands in contrast with the unmanaged memory model. Here, the developer does not need to be concerned too much with memory management. Garbage collection technology is often to thank for that. It is important to note that not every managed language must necessarily use a garbage collector, or that unmanaged languages cannot use one, although the exceptions to the rule are rare.

Memory allocation in C# is not too different from C++. In the previous section we demonstrated how the C++ developer can freely choose between stack or heap allocation. C# developers in contrast always use the new operator to instantiate objects with non-primitive types (exception: strings). The CLR uses complex rules to automatically determine which memory location (e.g. stack, managed heap, register) is most appropriate. Usually value types go on the stack and reference types go on the managed heap, but neither is guaranteed.

The main difference however is in the way both languages handle deallocation. C# does not provide a delete operator. Instead, the part of the CLR called the Garbage Collector (GC) analyses the working memory to find dead objects and to reclaim their memory. An object is considered dead when there are no more (strong) references pointing to it. The GC typically runs in its own thread and causes both time and memory overhead. Furthermore, the developer has limited control over the precise execution time. In fact, it is not even guaranteed that the GC will ever run automatically, although this is mostly a theoretical problem. The developer can force the garbage collector to run with GC.Collect(). Some settings can also be tweaked if necessary. Be careful though, it is recommended not to interfere with the GC process too much.

A common misconception is that objects in C# are cleaned up the moment they go out of scope. This is not true for objects that live on the managed heap. What happens is that the references to these objects – which live on the stack – disappear. If this causes an object to die due to lack of references, it will be collected during the next GC run.

Garbage collection is a specialised topic within computer science that is beyond the scope of this text. The interested reader is encouraged to do some research around the following keywords: reference counting, generational GC, latency, memory pressure, mark-and-sweep and stop-the-world.

In conclusion, managed languages with garbage collection increase developer comfort by lowering the cognitive burden. The price we pay is additional overhead of a non-deterministic nature.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s