The topic of this post is a general warning about writing code that you might need and not what you actually use right now. It focuses specifically on the performace of the C# XmlSerialixer, and the potential pitfalls of trying to handle all possible cases.

Below is a basic Serialize method that was being used in the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static class Util 
{
  public static string Serialize(object obj, Type[] types)
  {
    if (obj == null)
      return null;
    
    XmlSerializer ser = new XmlSerializer(obj.GetType(), types);
    
    using (MemoryStream memStream = new MemoryStream())
    {
      using (XmlTextWriter xmlWriter = new XmlTextWriter(memStream, Encoding.UTF8))
      {
        xmlWriter.Namespaces = true;
        ser.Serialize(xmlWriter, obj);
      }
      string xml = Encoding.UTF8.GetString(memStream.GetBuffer());
      xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));
      xml = xml.Substring(0, (xml.LastIndexOf(Convert.ToChar(62)) + 1));
      return xml;
    }
  }
}

The caller passes in an object to serialise and any other potential types they want to include.

At one point it must have been used in multiple places, but now after some refactoring it is only called by two methods, an example is shown below:

1
string xml = Util.Serialize(x, null);

In both places null is passed in for the array of additional types, and that null is passed onto the XmlSerializer itself.

This is where it get’s interesting as .NET emits assemblies when creating XmlSerializers, and caches the generated assembly, but only if you use the constructor XmlSerializer(Type T). If you pass in an array of types (even a null!) then the XmlSerializer is re-created each time and never unloaded from the AppDomain, leading to a memory leak in a long running web server.

Using the following unit test we can measure the relative performance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public void UsingOldUtil()
{
  GC.Collect();
  GC.WaitForPendingFinalizers();
  GC.Collect();
  
  var beforeMemory = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
  var beforeTime = DateTime.Now;
  
  for (int i = 0; i < 10000; i++)
  {
    var x = GetContext();
    
    string data = Util.Serialize(x, null);
  }
  
  var afterMemory = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
  var afterTime = DateTime.Now;
  
  var memoryChange = afterMemory - beforeMemory;
  var duration = afterTime - beforeTime;
  
  Console.Out.WriteLine(String.Format("VirtualMemorySize64 - Before: {0}, After: {1}. Total Change: {2}",
            beforeMemory, afterMemory, memoryChange));
  Console.Out.WriteLine(String.Format("Time taken - Ticks {0}, ms: {1}, seconds: {2}",
    duration.Ticks, duration.TotalMilliseconds, duration.TotalSeconds));
}

This gives the following results: (your mileage may vary)

VirtualMemorySize64 - Before: 264302592, After: 1705144320. Total Change: 1440841728

Time taken - Ticks 2485814687, ms: 248581.4687, seconds: 248.5814687

Using a Serialize method that omits the addtional types array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public static string Serialize(object obj)
{
  if (obj == null)
    return null;
  
  XmlSerializer ser = new XmlSerializer(obj.GetType());
  
  using (MemoryStream memStream = new MemoryStream())
  {
    using (XmlTextWriter xmlWriter = new XmlTextWriter(memStream, Encoding.UTF8))
    {
      xmlWriter.Namespaces = true;
      ser.Serialize(xmlWriter, obj);
    }
    string xml = Encoding.UTF8.GetString(memStream.GetBuffer());
    xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));
    xml = xml.Substring(0, (xml.LastIndexOf(Convert.ToChar(62)) + 1));
    return xml;
  }
}

This gives the following results:

VirtualMemorySize64 - Before: 264302592, After: 268562432. Total Change: 4259840

Time taken - Ticks 2911944, ms: 291.1944, seconds: 0.2911944

The new Serialize uses around 1% of the time and memory of the original method, which is a significant performance boost in our environment.

The basic take away from this post is only code for what you need now.



This post was originally published on Entelect’s internal Tech Blog, Yoda.