Static Reflection in .NET, part 2

A few weeks ago, I talked about static reflection and its advantages. You’ll remember that the main advantages, compared to the normal reflection API’s, are the compile time checking of parameters and IntelliSense support.

How does it compare at other levels, performance for example? Before we dive into that question, let me state that performance may or may not be important to you. A program that is fast enough is, well, fast enough. It’s unlikely that a (single) reflection call will have a significant impact on, say, the response time of a graphical user interface, and so performance doesn’t matter. If you’re algorithm requires millions of reflection operations, I’m sure you can rewrite it somehow to reduce that number significantly, and then performance again probably doesn’t matter anymore. That being said, we still want to know, right?

First of all, let’s compare code.

Take this line (using the Example class from the last post):

PropertyInfo pi = typeof(Example).GetProperty("Description");

This line compiles to the following IL (simplified for readability):

ldtoken Example 
call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle) 
ldstr "Description" 
call instance class PropertyInfo Type::GetProperty(string)

Compare that to the following line:

PropertyInfo pi = StaticReflector.Create<Example>().PropertyInfo(e => e.Description);

Which compiles to:

call class IStaticReflector`1<!!0> StaticReflector::Create<class Example>()
ldtoken Example
call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle)
ldstr "e"
call class ParameterExpression Expression::Parameter(class Type, string)
stloc.0 
ldloc.0 
ldtoken instance string Example::get_Description()
call class MethodBase MethodBase::GetMethodFromHandle(valuetype RuntimeMethodHandle)
castclass MethodInfo
call class MemberExpression Expression::Property(class Expression, class MethodInfo)
ldc.i4.1 
newarr ParameterExpression
stloc.1 
ldloc.1 
ldc.i4.0 
ldloc.0 
stelem.ref 
ldloc.1 
call class Expression`1<!!0> Expression::Lambda<class System.Func`2<class Example, string>>(class Expression, class ParameterExpression[])
call class PropertyInfo StaticReflectorExtensions::PropertyInfo<class Example, string>(class IStaticReflector`1<!!0>, class Expression`1<class System.Func`2<!!0, !!1>>)

As you can see, this code doesn’t load the “Description” string, it uses the ldtoken instruction instead. Some bloggers have suggested that this would make it more efficient. Unfortunately, even if the ldtoken instruction is efficient, it is largely offset by the construction of the lambda expression. I ran a little benchmark, in which I compare execution time (in ticks) and memory usage (in generation 0 garbage collection runs) of both approaches, executing each one a million times. This is the result (on my laptop):

Using Reflection       Time:    1089308 Collections:    45
Using StaticReflection Time:   13513777 Collections:   264

As you can see, the Static Reflection approach is about 13.5 times slower than the good old dynamic reflection, and it uses a lot more memory. That should be no surprise either: both cases allocate a PropertyInfo object, but the static case also allocates the expression, which is nothing but food for the garbage collector.

So, one approach seems good at compile time, and the other is good at run time. It seems we’re stuck between a rock and a hard place. But the situation isn’t so bad: we have two options to choose from, each with their pro’s and con’s. What the best one is depends on your requirements, and what you value the most: compile time checking (which may result in productivity and maintainability benefits), or performance.

And who knows, maybe there is a third option, giving the best of both worlds. But that’s for next time.


String.Trim() fixed in .NET 4.0

A long time ago, I wrote a blog post about the problems with String.Trim(). I’m happy to see that all three issues have been addressed in the .NET Framework 4.0.

To start with, Trim() will now be consistent with Char.IsWhiteSpace(). Theoretically, this is a breaking change, but I don’t expect many programs to have a problem with this change. Note that the change is very well documented in the online help.

Secondly, the code of Trim() has been cleaned up considerably. A string that consists entirely of whitespace is no longer scanned twice. I haven’t done any benchmarks, but I expect the performance to be at least as good as for the same function in .NET 2.0 – 3.5.

Last but not least, the frequent abuse of the Trim() function to simply validate strings will greatly decrease with the introduction of the static IsNullOrWhitespace(string value) function, which is much faster than calling Trim().

It’s a small detail, compared to all the other goodies .NET 4.0 brings, but a good addition to the toolbox nonetheless.


.NET 3.5 SP1 Beta available

I have mentioned before that a CLR update is due to be released this summer.

Scott Guthrie just announced that a beta is now available. On the CLR, he says:

.NET 3.5 SP1 includes significant performance improvements to the CLR that enable much faster application startup times - in particular with "cold start" scenarios (where no .NET application is already running).  Much of these gains were achieved by changing the layout of blocks within CLR NGEN images, and by significantly optimizing disk IO access patterns.  We also made some nice optimizations to our JIT code generator that allow much better inlining of methods that utilize structs.

We are today measuring up to 40% faster application startup improvements for large .NET client applications with SP1 installed.  These optimizations also have the nice side-effect of improving ASP.NET application request per second throughput by up to 10% in some cases.

It's not just an update to the CLR though, it's a significant service pack to both the .NET Framework and Visual Studio 2008. Another novelty I'm definitely going to take a look at is Linq to Entities:

.NET 3.5 SP1 includes the new ADO.NET Entity Framework, which allows developers to define a higher-level Entity Data Model over their relational data, and then program in terms of this model.  Concepts like inheritance, complex types and relationships (including M:M support) can be modeled using it.

The ADO.NET Entity Framework and the VS 2008 Entity Framework Designer both support a pluggable provider model that allows them to be used with any database (including Oracle, DB2, MySql, PostgreSQL, SQLite, VistaDB, Informix, Sybase, and others).

Developers can then use LINQ and LINQ to Entities to query, manipulate, and update these entity objects.


CLR Update this summer

On his blog, Scott Guthrie announced that an update for the .NET CLR will be released this summer:

This summer we are going to ship a servicing update to the CLR that makes some significant internal optimizations in how we optimize our data structures to cut down on disk IO and improve memory layout when loading and running applications. Among many other benefits, this work will significantly improve the working set and cold startup performance of .NET 2.0, 3.0 and 3.5 applications and will dramatically improve end-user experiences with .NET-based client applications.

Depending on the size of the application, we expect .NET applications to realize a cold startup performance improvement of between 25-40%. Applications do not need to change any code, nor be recompiled, in order to take advantage of these improvements so the benefits are automatic.

Free improvements are always good improvements. I do hope this update will include the optimizations on value types the JIT team has been blogging about:

Code generation for value types in .NET 2.0 has several inefficiencies.

1) All value type local variables live entirely on the stack.

2) No assertion propagation optimization is ever performed on value type local variables.

3) Methods with value type arguments, local variables, or return values are never inlined.

[...]

Over the past year or so, the JIT team has been working on significant improvements to value type code generation, as well as the inlining algorithm. In summary, all of the above limitations are being eliminated.


.NET Framework Library Source Code now available

See the announcement on Scott Guthrie's blog. See setup instructions on Shawn Burke's Blog.


.NET Framework Source Code to be Released

Last Wednesday, Scott Guthrie announced the release of the .NET Framework Library source code. "You'll be able to download the .NET Framework source libraries via a standalone install (allowing you to use any text editor to browse it locally). We will also provide integrated debugging support of it within VS 2008."

This is big news. It will allow those of us who case about quality to gain a deeper understanding of the frameworks, beyond what the MSDN library, google or even Lutz Roeder's .NET Reflector can give us.

Before you get too excited though, it's worth mentioning that you're only allowed to view the source for documentation purposes and to use it for debugging your applications. You are not allowed to modify or redistribute it. But hey, that doesn't make it less useful!

Technorati tags: , ,

Versioning .NET Assemblies

Some questions seem to come and go in waves. Recently, several people asked me about versioning .NET assemblies. There are at least four attributes int the BCL that allow you to specify version information for a .NET assembly: AssemblyVersionAttribute, AssemblyFileVersionAttribute, ComCompatibleVersionAttribute and AssemblyInformationalVersionAttribute. How should you use them?

The Assembly Version (AssemblyVersionAttribute) reflects the version of the specification of the assembly. It changes when the API changes (types or methods added, modified or removed), or when the semantics of the API change (a method now does something functionally different). When neither of these conditions are met, existing clients will be compatible with the new “version”, and the Assembly Version should not change.

The File Version (AssemblyFileVersionAttribute) reflects the distribution. It changes when the binary image of the Assembly changes, even when the Assembly Version does not. Typically, this is the result of bug fixes or internal optimizations.

While in theory the File Version allows any string to be used as a value, it is highly recommended to use a four number version string, according to the same syntax and semantics as the assembly version.

These version numbers consist of four numbers in the range 0 to 65534. The four values indicate:

  • Major Version: change when features have been modified or removed.
  • Minor Version: change when features have been added or Major version changed.
  • Build number: change when bugs have been fixed or Minor Version changed.
  • Revision number: change when non-functional improvements were made or the Build number changed.

When using a correct numbering scheme, compatibility between versions is as follows:

  • A change in major version: the new version is not compatible with the old version.
  • A change in minor version (but not in major version): the new version is backwards compatible with the old version, but not forward compatible. Applications using the new features don’t work with the old version, but old applications do work with the new version.
  • A change in build number (but not in major or minor version): the new version is binary compatible with the old version, both forward and backward. A change in behavior may be observed as a result of a bug fix.
  • A change in revision number only: the new version is binary compatible with the old version, both forward and backward. Only non-functional changes in behavior, such as changed performance characteristics, may be observed as a result of non-functional changes.

When one of these numbers is modified, typically all lower level numbers are reset to zero.

If an assembly exposes types defined in another assembly in its public API, and the other assembly’s Assembly version changes, then the Assembly version of this assembly should change as well. If the other assembly has increased its major version, increase this assembly’s major version as well. Avoid exposing types defined in third-party assemblies, in order to limit this problem.

Summary:

Changed version number Reason Compatibility
Major version Features changed or removed None
Minor version Features added Forward only
Build number Bugs fixed Forward and backward
Revision number Non-functional changes Forward and backward

Typically, the Major Version and Minor Version are the same in the Assembly Version and in the File Version. The Assembly Version will have the build number and revision number equal to zero, while the File Version is updated with every bug fix or non-functional improvement.

Assembly Information (2)

When the Assembly Version includes the Minor version, strong named assemblies compiled against the previous version will not pick up the new version from the GAC. You need to recompile the client assemblies against the new version, configure the client applications, or create a publisher policy. Likewise, assemblies compiled against the new (minor) version won't accidentally pick up the old version.

When you leave the build and revision numbers equal to zero in the Assembly version, you don't need to do anything when you want to deploy a bug fix or internal optimization.

The VB and C# compilers generate an operating system resource in the assembly file, such that several of these attributes show up in the Version tab of the file properties dialog box. Please note that the C++ compiler does not do this. In C++, a version resource needs to be added manually by the developer, and the developer must manually synchronize its contents with the assembly level attributes.

U2U.Framework.ApplicationLayer.dll Properties

Typically, there should be no reason to include a ComCompatibleVersionAttribute or AssemblyInformationalVersionAttribute.


Properties with property changed event, part 3

A while ago, I talked about how to write basic events for changed properties, and about the INotifyPropertyChanged interface. There is a third way to manage events, which is especially useful when your class has many events, but you expect a very low number of them to be actually handled.

As discussed before, every EventHandler stored in your object takes up space, which is a bit of a waste if most of those will be null. But events allow you to implement the add and remove methods yourself, so you can choose where to store the EventHandler delegate.

One way to do that is through the System.ComponentModel.EventHandlerList class. System.ComponentModel.Component exposes an Events property of this type, which is used by all classes inheriting from Component, including all Windows Forms controls.

The following example inherits from System.Windows.Forms.TextBox, adding a CueBanner property (like the Internet Explorer 7 search box) and a CueBannerChanged event:

using System; 
using System.ComponentModel;
using System.Windows.Forms;

namespace U2U.Framework.Windows.Forms
{
/// <summary>
/// Textbox that displays a CueBanner when the Text is empty.
/// </summary>
public class CueBannerTextBox : TextBox
{
private string cueBanner = string.Empty;

/// <summary>
/// Gets or sets the prompt text to display when there is nothing in the Text property.
/// </summary>

[Browsable(true)]
[EditorBrowsable(EditorBrowsableState.Always)]
[Category("Appearance")]
[Description("The prompt text to display when there is nothing in the Text property.")]
[DefaultValue("")]
public string CueBanner
{
get { return cueBanner; }
set
{
if (value == null)
{
value = string.Empty;
}
if (value != cueBanner)
{
cueBanner = value;
NativeMethods.SendMessage(Handle, EM_SETCUEBANNER, IntPtr.Zero, cueBanner);
OnCueBannerChanged(EventArgs.Empty);
}
}
}

private const int EM_SETCUEBANNER = 0x1501;
private static readonly object EVENT_CUEBANNERCHANGED = new object();

/// <summary>
/// Occurs when the value of the <see cref="CueBanner"/> property has changed.
/// </summary>
[Category("Property Changed")]
[Description("Event raised when the value of the CueBanner property changed.")]
public event EventHandler CueBannerChanged
{
add { base.Events.AddHandler(EVENT_CUEBANNERCHANGED, value); }
remove { base.Events.RemoveHandler(EVENT_CUEBANNERCHANGED, value); }
}

/// <summary>
/// Raises the CueBannerChanged event.
/// </summary>
/// <param name="e">An <see cref="EventArgs"/> that contains the event data.</param>
protected virtual void OnCueBannerChanged(EventArgs e)
{
EventHandler handler = base.Events[EVENT_CUEBANNERCHANGED] as EventHandler;
if (handler != null)
{
handler(this, e);
}
}
}
}


You'll also need this:

using System; 
using System.Runtime.InteropServices;

namespace U2U.Framework.Windows.Forms
{
internal static class NativeMethods
{
[DllImport("user32", CharSet = CharSet.Unicode)]
internal static extern IntPtr SendMessage(IntPtr hWnd, int message, IntPtr wParam, string lParam);
}
}


Notice how the CueBannerChanged event provides an explicit implementation of the add and remove methods. The add method adds the delegate in the inherited Events collection, using a private static object as the key, and the remove method removes it from that same collection.

The OnCueBannerChanged method retrieves the delegate from the same collection, using the same key. Notice how this collection can store any type of delegate, not just EventHandlers, so we need to cast it back to EventHandler before we can use it.

Enjoy.

Technorati Tags: , , , , ,

Orcas: February 2008 it will be

Microsoft named the date. In the latest MSDN Flash, they said: "[...] February 2008 is shaping up to be Microsoft's largest launch month - ever - with RTMs of Visual Studio 2008, Windows Server 2008, and SQL Server 2008 all on tap."

Update: In case you haven't heard, Visual Studio 2008 was released on Monday 11/19. The big launch party is still planned for February in Las Vegas.

In the meantime, Rico Mariani concludes his series on LINQ To SQL performance. It looks like it will be easy to get performance virtually on par with manually optimized code using SqlDataReader directly. That's great news, especially if you realize that most code today isn't manually optimized that way. For example, based on the figures he gives, I expect that LINQ To SQL will outperform the TableAdapters and typed DataSets you generate in Visual Studio 2005.

Unfortunately, we can't verify this yet, since the current Orcas Beta 1 doesn't have the optimizations they did to get to this result. But Beta 2 should have them, and that's due to be available "later this summer". That means it might be around when I'm back from my holidays!

Technorati Tags: ,

String.Trim() has problems

String.Trim() has some problems:

  1. It has bugs
  2. It is slow
  3. It is too often abused

String.Trim() has bugs

Yep, I'm not joking. The documentation says: "Removes all leading and trailing white-space characters from the current String object". So what do you expect the following program to do?

for (int i = (int)char.MinValue; i <= (int)char.MaxValue; i++) 
{ 
    char c = (char)i; 
    string s = c.ToString(); 
    bool charIsWhiteSpace = char.IsWhiteSpace(c); 
    bool trimTreatsCharAsWhiteSpace = s.Trim() == ""; 
    if (charIsWhiteSpace != trimTreatsCharAsWhiteSpace) 
    { 
        Console.WriteLine("Problem with char {0:X}: charIsWhiteSpace == {1}, trimTreatsCharAsWhiteSpace == {2}.", 
             (int) c, charIsWhiteSpace, trimTreatsCharAsWhiteSpace); 
    } 
}


According to the documentation, I would expect this to write nothing to the console, but here you go:

Problem with char 180E: charIsWhiteSpace == True, trimTreatsCharAsWhiteSpace == False.
Problem with char 200B: charIsWhiteSpace == False, trimTreatsCharAsWhiteSpace == True.
Problem with char 202F: charIsWhiteSpace == True, trimTreatsCharAsWhiteSpace == False.
Problem with char 205F: charIsWhiteSpace == True, trimTreatsCharAsWhiteSpace == False.
Problem with char FEFF: charIsWhiteSpace == False, trimTreatsCharAsWhiteSpace == True.
 

I looked them up:

180E: Mongolian vowel separator
200B: Zero width space
202F: Narrow no-break space
205F: Medium mathematical space
FEFF: Zero width no-break space

I'm not sure about the Mongolian thing ;-), but the others do look like white space to me.

String.Trim() is slow

Granted, this function has quite a bit of work to do. Basically, it needs to:

  1. Find the first non-white-space character in the string
  2. If there is none, return the empty string
  3. Find the last non-white-space character in the string
  4. If you get the entire string, just return it, otherwise return an appropriate substring.

And basically, that's what the function actually does. Well, in fact it swaps steps 2 and 3, so all space strings are actually scanned twice. Much worse is how the searching is done, or in fact, how a character is determined to be white-space or not. Reflector shows a pretty weird loop construct there. Unrolling that inner loop could speed up things considerably.

String.Trim() is too often abused

Have you ever written code like this:

if (s != null && s.Trim() != "")
{ 
    // ... 
}


Don't! In a test like that, you're not interested in the actual result, so don't calculate it! Look at the algorithm again: of the four steps mentioned, we only need the first one executed. We don't need the actual substring!

Now take a look at this function:

static bool IsEmptyOrWhiteSpace(string s) 
{ 
    foreach (char c in s) 
    { 
        if ((c < (char)9 || c > (char)13) && 
            c != ' ' && 
            c != (char)133 && 
            c != (char)160 && 
            c != (char)5760 && 
            (c < (char)8192 || c > (char)8203) && 
            c != (char)8232 && 
            c != (char)8233 && 
            c != (char)12288 && 
            c != (char)65279) 
        { 
            return false; 
        } 
    } 
    return true;
}


It's compatible with String.Trim(), so it has the same bugs mentioned above. But that makes it a perfect replacement in s.Trim() != "" tests, it won't change the semantics of your code.

I did a small benchmark, comparing its speed and memory usage with this function:

static bool Naive(string s) 
{ 
    return s.Trim() == ""; 
}


And these are the results:

Length:   0 WhiteSpace at start:   0 WhiteSpace at end:   0 
 Naive                      Time:     1,65 GC:      0
 IsEmptyOrWhiteSpace        Time:     0,17 GC:      0 
 
Length:   1 WhiteSpace at start:   1 WhiteSpace at end:   1 
 Naive                      Time:     5,27 GC:      0
 IsEmptyOrWhiteSpace        Time:     0,27 GC:      0 
 
Length: 100 WhiteSpace at start:   0 WhiteSpace at end:   0 
 Naive                      Time:    12,44 GC:      0
 IsEmptyOrWhiteSpace        Time:     0,72 GC:      0 
 
Length: 100 WhiteSpace at start:   1 WhiteSpace at end:   1 
 Naive                      Time:    37,23 GC:  20733
 IsEmptyOrWhiteSpace        Time:     1,08 GC:      0 
 
Length: 100 WhiteSpace at start:  44 WhiteSpace at end:  55 
 Naive                      Time:   177,77 GC:   1920
 IsEmptyOrWhiteSpace        Time:    20,46 GC:      0 
 
Length: 100 WhiteSpace at start: 100 WhiteSpace at end: 100 
 Naive                      Time:   169,46 GC:      0
 IsEmptyOrWhiteSpace        Time:    38,50 GC:      0
 

I called each function millions of times for different strings. For each string, I display its length and the number of leading and trailing spaces (ASCII 32). The first column shows actual time, second column shows the number of generation 0 garbage collection runs. Both functions were called through a delegate, which allowed me to take loop and call overhead into account. Obviously, I compiled with optimizations turned on.

As you can see, the IsEmtyOrWhiteSpace function is always faster than the naive approach, sometimes by orders of magnitude! A common case in many applications is a long string, e.g. an HTML document, ending with a CR/LF combination. That's actually the worst case for the naive approach, while being the best case for the IsEmptyOrWhiteSpace function.

Obviously, IsEmptyOrWhiteSpace doesn't do any memory allocations, while the naive approach in certain cases does.

Oh, and in case you wondered: I tried for loops and an unsafe version as well, both were slower than the simple foreach loop used above.

Technorati Tags: