Stefan Rusek | LINQ Expressions as Fast Reflection Invoke

LINQ Expressions as Fast Reflection Invoke

Saturday, October 25, 2008

Lately, I've been working on a side project that generates a lot of LINQ expression trees (LET). They provide a fun new way to dynamically generate executable code at runtime. Dynamic code generation has been around for a long time, and .NET has had dynamic assembly generation since the very beginning. The original mechanism, System.Reflection.Emit can be a lot of work to generate something small. (Not to mention it has little IL validation, so it will happily generate broken code that crashes your program with all kinds of strange exceptions.) Yesterday, I asked myself why not use expression trees instead of reflection for dynamic invocation. It immediately struck me as a wonderful idea.

The reason LETs are so great is that they compile down to real IL, and then get JITed, so they run at native speeds. Using reflection to dynamically invoke a method is far slower. Here is the code to create generic accessor delegates using both LET and reflection:

static Func<T, U> CreateWithLET<T, U>(string property) 
{ 
    var t = Expression.Parameter(typeof(T), "t"); 
    var prop = Expression.Property(t, property); 
    return (Func<T, U>)Expression.Lambda(prop, t).Compile(); 
}
static Func<T, U> CreateWithReflectionFast<T, U>(string property) 
{ 
    Type type = typeof(T); 
    object[] args = new object[0]; 
    BindingFlags flags = BindingFlags.Instance | BindingFlags.Public; 
    PropertyInfo propinfo = type.GetProperty(property, flags); 
    MethodInfo getter = propinfo.GetGetMethod(); 
    return (T t) => (U)getter.Invoke(t, args); 
}
static Func<T, U> CreateWithReflectionSlow<T, U>(string property) 
{ 
    return (T t) => (U)typeof(T).InvokeMember(property, 
        BindingFlags.Instance | BindingFlags.GetProperty | BindingFlags.Public, 
        null, t, new object[0]); 
}

I have focused on making the first two methods short, readable, and as efficient as possible. Both cases do as much work as possible in the create method. The third is to show that the reflection version can be written in one line, but if you plan to use it repeatedly, it will be much slower than the multi-line version above. This is a really simple yet very powerful example. It would be trivial to change this example to call a method and pass in arguments

I wrote four benchmarks. Each benchmark runs in a loop, the cost of the loop is subtracted from the total, and each delegate is given 100 warm up iterations to ignore the cost of just in time compilation. (I've included the benchmark driver at the bottom of this article.) The first benchmark calculates the amount of time it takes to create the delegates we are testing. The second benchmark times 100,000 iterations accessing an int property. The third benchmark times 100,000 iterations accessing a string property. The fourth benchmark times 100,000 iterations accessing two int properties and adding the results from within the generated LET .

I added the int tests because I thought the reflection tests would be adversely effected by needing to box and unbox the value of the property. In reality, reflection is so much slower that boxing accounts for about 2% of execution time.

Create Delegate Access int
(100,000 times)
Access string
(100,000 times)
Access 2 ints
(100,000 times)
With LETs 2.338ms 0.45ms 0.46ms 0.53ms
With Fast Reflection 0.46micros 474ms 462ms 976ms
With Slow Reflection 0.05micros 988ms 862ms 1751ms

I was surprised at just how much faster the LETs were than reflection. I expected them to be faster but they are about 1,000 times faster than the than fast reflection. The slow reflection approach takes twice as long as the fast reflection. I should point out that if you are only going to call the method once then reflection is still faster, but if you are going to use it in a loop, then LETs will generally win out. Another case were it makes sense to use the LET approach is when you want to precreate the accessor delegate and cache it so you can run it later when you need the speed.

Most of the 2ms it takes to compile the lambda expression is spent creating a dynamic assembly and various other things. So if you need to access multiple members dynamically and then do an operation on them, then LETs become even more appealing. It does not take significantly longer to to generate a lambda that access two properties than it does to access one. As you can see in the benchmark, a more complicated LET only takes a small amount of extra time, while using reflection is takes twice as long. This means that if you need to do some more complicated reflection, then LETs become more and more appealing.

Below is the benchmark function I wrote:

static void Run<T, U>(T t, Func<T, U> a) 
{ 
    // time loop 
    var sw1 = Stopwatch.StartNew(); 
    int j; 
    for (int i = 0; i < 100000; i++) 
        j = i; 
    sw1.Stop();

    // warm up 
    for (int i = 0; i < 100; i++) 
        a(t);

    // run test 
    var sw2 = Stopwatch.StartNew(); 
    for (int i = 0; i < 100000; i++) 
        a(t); 
    sw2.Stop(); 
    Console.WriteLine(sw2.Elapsed - sw1.Elapsed); 
}

Stefan Rusek | The 3 cast operators in C#

The 3 cast operators in C#

Tuesday, October 21, 2008

There are three cast operators in C#. The first is the traditional cast operator that uses parentheses, the second is the "is" operator, and the third is the "as" operator.

Normal Casts

This operator uses the same syntax as the original C cast operator. The key thing is that it always returns an instance of the type you are requesting. If it can't do the cast, then it throws an exception. The majority of the time this is the desired behavior, because you want an exception earlier in your code rather than later.

object o = "hi";

string s = (string)o; // works

double d = (double)o; // throws an exception since it can't return a double

The "is" operator

This operator doesn't actually return the result of the cast. It simply returns whether the cast would have succeeded. This is handy when you want to ask if an object is an instance of a type, but you don't actually need an instance of the type. This operator is pretty useful, but it turns out that most of the time you really want to use the "as" operator, however, there are times when you really do want the is operator.

if (o is string)  lblType.Text = "o is a string";

if (o is double)  lblType.Text = "o is a double";

Some people might disagree that the "is" operator is a cast. They might say that it is a runtime type check or something like that, but type references in .net have the same value before and after a cast. This means that the normal cast simply does a runtime type check and if the result if false, then it throws an exception.

The "as" operator

The "as" operator returns an instance of the type request, but if it cannot do the cast, it will return null. This operator is handy for fixing the most common pattern for using the "is" operator. Since the "is" operator is a cast in disguise, when people use the "is" operator they usually end up doing two casts, and the "as" operator allows you to avoid the double cast.

if (o is string) {

   string s = (string)o; // wrong because we are doing 2 casts

   lblO.Text = s;

}

string s = o as string;

if (s != null) lblO.Text = s;

In C# v2, the as operator becomes even nicer with the null coalescing, because you can easily replace a non-matching cast with a desirable default.

lbl0.Text = o as string ?? "No value given";

Common Mistakes

When I first learned about the "as" operator, I used it a bunch, and I ended up with code like this:

Converter c = x as Converter;

c.DoConversion();

The problem with the above code is that if the cast fails, then the second line throws an exception. If that cast fails, then you really want a cast excpetion and not a null exception, since the cast tells you a lot more information about the problem.

The other bad thing I did was before I learned about the "as" operator. I did this a lot:

if (x is Converter) ((Converter)x).DoConversion();

This doesn't throw an exception, but it does do two casts. Casts are not as expensive in .net as they are in many other languages, but they are still non-trivial, so you want to avoid unneeded casts.


Stefan Rusek | Don't use IP geolocation to lock out users

Don't use IP geolocation to lock out users

Wednesday, September 24, 2008

IP geolocation --using a user's IP address to determine where in the world they are-- seems like a good idea, but I am not convinced. I spend most of my time in Poland, but not all of it. I use a lot of online services, and some of these services work in Poland and some don't. Certain media companies have made crazy licence agreements with certain other normally sane companies, and the crazy media companies want to prevent people outside of the US from consuming their content. (I understand why these companies have made these agreements; although, I don't really think they are effective, but that is another blog post.) The majority of the normally sane companies use IP geolocation to determine if a user is inside the US. Up until recently most of these companies had an error message like the one below.

No Zune forYou

Most companies have recognized that it is retarded to explain just how bad their location detection technology is in your error messages (but in the last couple months most have changed their error messages).  The problem with geolocation is that it is not really a reliable way to find out where someone is. Sure from a theoretical perspective it works, but in reality there are too many edge cases.

When I worked at Fog Creek software, we used a Canadian data center-- or, I should say it was a Canadian company that has data centers in the US and Canada. We used their Manhattan data center. All our servers were in Manhattan, which is several hundred miles from the Canadian border. Since Pier 1 is a Canadian company,  all our servers' IP addresses looked like Canadian IPs. Joel gives more detail in an article about Pier 1.  For the most part it was just funny. You type in http://www.google.com/ and you are taken to http://www.google.ca/, with all the text painstakingly translated into Canadian. However, you could forget about downloading something from VMWare's website to one of our servers. Because the IP address was considered Canadian, many websites refused to recognize that we were actually in New York City. As the world gets smaller due to globalization, IP based location makes less and less sense. We will be running into more and more problems because people are in the "wrong" place when they aren't.

For most service providers there is a much better solution. In fact, almost every website that uses an IP to block users already has the capability.  I am talking about my billing address. Do you realize just how much work it is to get a billing address in another country?  I have a billing address in the US, but I don't yet have one here in Poland. Sure I have a mailing address, I am even renting an apartment; however, that doesn't mean I can open a bank account here. I am waiting for my residency permit, and before that, I cannot even register with the government as really living at my address.

I have also seen the other side of this when Aneta and I tried to open a bank account in NYC. Aneta is not a US citizen. She has a green card now, but it was not yet approved when we moved to NYC. She and I went to the bank and opened a joint checking account. It took about 3 times longer than normal because Aneta wasn't a US citizen and most people don't understand what it means to be a pending green card applicant. (BCIS certainly doesn't make it easy by not providing immigrants with some sort of useful documentation of their status, especially when they have to wait for 2 years by law for approval.) So After a lot of explaining and showing papers to the lady at the bank, we eventually got an account in both our names.

Another instance when I was struck by how hard it is to get an address was when we went to the DMV to get Aneta's first driver's license in the US. This was her first ID of any kind valid in the US. We had to provide tons of documentation just to show that she was legally in the country, then we had to show our marriage certificate in order for the DMV to let her use the same address as me. Typically, we don't worry about these things. When I was 15 I went to the DMV showed my birth certificate, SSN card, and my mom showed her license. The hardest part was the written exam, which isn't really that big a deal. Yet for immigrants it is not that simple, and for people who aren't actually in country it is next to impossible.

For all their legendary reputation for usability, Apple can't seem to make iTunes into an app I enjoy using. Yet, Apple has done one thing right. Maybe it is because the iTunes music store works in Europe where IP location just doesn't make sense for anything. I can tell iTunes, that I live anywhere in the world, browse the library and look at all the fun French music that they can buy in France. If I try to buy some of that great French music, it will inform me that my US billing address cannot be used in the French Iiunes store. I can however buy all the music, movies, and TV shows I want in the US store, no matter where in the world I am. I realize that this system does serve me well, since I am not in the US most of the time, but many fewer people are in my situation, than are being wrongly being locked out of these services. It is also worth mentioning that I have VPN and Remote Desktop access anytime I want through the company I work for, so I can have a Seattle IP address whenever I need it.

Some companies like Apple totally get it right, while some companies get it half right. I can go to zuneoriginals.com in the US and pick out which Zune I want, which color I want, and what lithography I want. I can then order that Zune, and have shipped to an address in the US. If I go there from Poland, then I get the message you see at the beginning of this article. The site is pretty cool, and the lithography is something that Apple doesn't offer. (I think it might boost the global desirability of Zunes if more people knew about this site.) The thing is that even if someone outside the US could get to the site, they could never get past the checkout page. Why use IP geolocation to block something that is already blocked? Luckily for me when I ordered a Zune last week, I just used Remote Desktop to connect to one of the icanhascheezburger.com servers and ordered it from a computer in Seattle. In a couple days it will arrive in Seattle and I will have to get someone to ship it here, but at least in a couple weeks I'll be listening to audiobooks on my new 16gb Zune. Where Microsoft gets it right with the Zune is that I can subscribe to Zune Pass using a US address and get subscription music anywhere. I can't buy Zune points though. I am sure they will get it all worked out if Zunes ever get popular enough for MS to sell them outside the US.

screenshot6657.png

Aneta uses rhapsody, and we get mixed results with it. She can download music with the rhapsody software, but not stream it live from the rhapsody website. Amazon unbox is just broken. I can buy stuff, and I can play stuff, but if I try to download stuff, I am blocked. Hulu has change their error message from one like the one at the top, to a petition where you can register so they can try to get better licenses. (I guess Hulu wants the crazys to see how crazy they are.)

Of the companies I've mentioned in this article, Hulu is the only one that doesn't require me to give them money to use their service. In their case I don't propose that they drop geolocation especially since they are still only in the US, but they should provide a way to verify a billing address before blocking people.

IP Geolocation should be phased out wherever possible. Where it is not possible, it should not be the only means of making location based decisions about users. A US billing address is even harder to get than a US proxy server, and provides much stronger confirmation than an IP address can.


Stefan Rusek | Strongly typed (def)s in Xronos

Strongly typed (def)s in Xronos

Sunday, November 16, 2008

Yesterday, I commited a changeset that greatly alters how (def)s and Vars are handled in Xronos. Previously, the Var class was modeled after the Clojure Var class. The Var class contained a root value of type object and for thread bindings it used a Box class (that basically wrapped a value of type object). The Var class provides the basic container for all global variables.

I was reading about Clojure's recent support for ahead of time compilation, and I noticed a few subtle differences between Var resolution in Clojure and in Xronos. Clojure resolves Vars at compile time and Xronos resolved them at runtime, so I decided to fix Xronos so it matched the behavior of Clojure. The change as simple enough and now the following code which would fail in Clojure also fails in Xronos:

(defmacro y [] x) 
(def x 5) 
(pr (+ 6 (y)))

This fails be cause y references the Var x before it is (def)ed, and in order to compile y we have to resolve the Var x.

As I was making this change, I realized that I now had access to the exact Var that was going to be used at runtime. This means that I have more options of what I can do with it as far as static compilation is concerned. So I broke Var into two classes VarBase and Var<T>. VarBase has the Name and Namespace properties and a few others, but Var<T> handles the value and a few other things. I also replaced Box with Box<T>. By default (def) will create a Var<object> which is functionally identical to the old Var class, but if you add metadata specifying the type then it will create a Var<T> of the specified type. This code will print 11 in both Xronos and Clojure:

(def #^{ :type System.Int32 } x 5) 
(pr (+ 6 x))

The difference is that in Xronos, the + is compiled to a integer add, while in Clojure, the + is forced to do dynamic addition. This example is not so great, since it tends to defeat the dynamic nature of the language, but in cases where the extra bit of performance is needed, then strongly typed Vars are great! There is also a very common case were it makes sense.

(defn add5 [x] (+ x 5)) 
(defmacro monkey [& expr] `(str "mon" ~@expr "key"))

Both of these result in a call to (def), and in Xronos it creates a Var<IFunction>. The advantage of this is that we now know at compile time that the Var<T> contains an IFunction. So (add5 6) now gets compiled to the eqivalent of:

varAdd5.get().Invoke((object)6);

instead of

((IFunction)varAdd5.get()).Invoke((object)6);

That simple change results in one less cast per form, and cuts startup time by about 30%. I thought about making def always infer and set the type of Vars, but the problem with that is that many types have compatible dynamic interfaces, and so no type makes sense but object.


Stefan Rusek | Facebook is Way Ahead of Everyone Else

Facebook is Way Ahead of Everyone Else

Wednesday, November 19, 2008

The other day, I was talking with my friend, Nathan, about Facebook. I don't use Facebook a ton, but I do use it more often and more consistently than I ever have in the past. I believe the much-hated new UI has a lot to do with it. The big complaint about the new UI is that it hides a lot of Facebook features. When I say a lot, I truly mean all but the 3 or 4 main Facebook features. At the same time, they did add one new feature: you can now reply to other users' statuses. So they basically traded 98% of Facebook for Twitter. The interesting thing is that even though I am not convinced the new UI itself is much better, I find I use Facebook more than before, because of vastly simplified user experience!

Now hidden in the depths of Facebook, among the many missing features is the Facebook API. The majority of Facebook API apps are total crap! This is where Facebook is ahead of the curve, because they have already realized this and made it super hard to find the apps. MySpace and every other social network out there was left scrambling to add their own API. And guess what? The majority of those API apps will be crap too! In fact, the majority of API apps for anything are going to be crap. (The best you can ever hope for is that an API will make it easier for the company itself to add good features to the product.) So Facebook made their API and has realized that even though it is a great idea from a PR perspective, it totally sucks from a quality perspective. Since the apps appear to be a part of Facebook to the average user, but are usually much lower quality, the user sees Facebook as having dropped in quality.

This I think is the most interesting "feature" of the new Facebook UI. It tells us that the Facebook people know that as much fun as I had playing Knighthood a few months back, I eventually got tired of it sucking and quit using Facebook for a while. I didn't just quit Knighthood, but I quit Facebook too. Since then Facebook has Twitterified itself, people are being more social than ever, and Facebook has put the "social" back into social networking.


Stefan Rusek | Big update to the blog

Big update to the blog

Monday, December 01, 2008

This weekend, I pushed a huge update to the blog. There are two new features: OpenId and commenting.

OpenId support means that if your OpenId provider supports profile information (I already know openid.net does, and that some of the big names don't), then you can authenticate seamlessly with OpenId. Sometime soonish, I'll put in some updates to make it so users can edit their info on my blog, and then I won't need to require profile information from the OpenId provider.

If you don't use OpenId for your authentication, then you can use this blog as an OpenId provider. Your OpenId is http://stefan.rusek.org/User/<username>/.

Comments are now allowed on all posts if you are logged in. Please be polite and considerate. I will delete comments that I feel are inappropriate.


Stefan Rusek | ASP.NET MVC Views in Xronos

ASP.NET MVC Views in Xronos

Tuesday, December 09, 2008

With the release of Xronos v0.1 comes an overhaul of how it works with ASP.NET MVC. First I removed the ASP-like syntax. The syntax was familiar, but it turns out that the syntax did not actually do anything helpful, because (pr) ends up making cleaner code. If that was the only change, then writing views in Xronos would be pretty lame. The new system is modeled after ASP.NET MasterPage system.

Enabling the ViewEngine is the first step toward using Xronos with ASP.NET MVC. Add a line to the Application_Start event in your Global.asax.cs file, before the call to RegisterRoutes(). The ViewEngine constructor takes any number of path names to files to compile immediately.

ViewEngines.Engines.Insert(0,
    new System.Xronos.Web.Mvc.ViewEngine("~/Views/Shared/Library.x");

Normal pages are simple to create, just write a (render) function that prints out your HTML:

(defn render [] (pr 
    "<html><head><title>" (get viewdata "Title")  "</title></head>" 
    "<body>Yay! it works</body></html>" 
))

This is fine and all, but most of the time you want a bunch of views that share a common layout. If you are familiar with how master pages work in ASP.NET, then Xronos master pages will be very familiar to you. In the master page file you need to have a (render) function and a params variable defined.

; ~/Views/Shared/Site.Master.x 
(def params) 

(defn header [] ... ) 
(defn footer [] ... ) 

(defn render [] 
    (header) 
    ((:body params)) 
    (footer) 
)

Then in your page you simply specify that you are using a master page and the content of the params.

; ~/Views/Home/Index.x 
(def master "~/Views/Shared/Site.Master.x") 
(xronos/refer 'mvc) 

(defn renderItems [] 
    (dorun (map blog/renderItem (get viewdata "Items"))) 
) 

(def content {:body renderItems})

Since Xronos compiles each file in its own namespace, each page is isolated from each other and cannot access any other functions unless your library puts them in an accessible namespace

; ~/Views/Shared/Library.x 
(in-ns 'blog) 
(xronos/refer 'mvc) 

(defn renderItem [item] ... )

At the moment Xronos only supports writing views. Sometime soon-ish, I hope to make it easy to write entire MVC apps in xronos. In the meantime, it is still an awesome language to write MVC views.


Stefan Rusek | Ready to Quit ASP.NET

Ready to Quit ASP.NET

Wednesday, February 11, 2009

About four years ago, I wrote a little CMS with support for plugins and a number of other neat features. I continued to work on it off and on until I went to work for Fog Creek. Since then, it has languished in a Subversion repository on my hard disk. I don't have any desire to resurrect it, so the repository will probably get converted to Mercurial and then languish for another four years.

At the time, I spent a lot of time figuring out how APS.NET ticks. It is really a remarkable system. Originally, it was designed to allow Visual Basic programmers to transition to writing web apps without having to learn much about web programming. Yet for all its power and cleverness, is just plain fails. It failed in 2001 and fails even harder now.

The idea that you can shield web developers from having to get into the nitty-gritty and learn HTML is just cuckoo. HTML is the minimum any web developer should know, and in today's web, they should also know JS and how to use jQuery or a similar JavaScript framework. While ASP.NET tries to save you from doing that, its abstractions leak so badly that all the boys in Holland couldn't plug all the holes. Even if it did, despite its powerful control library, ASP.NET does not make it easy to do something it was not specifically designed for--and it was designed pretty narrowly.

Let's look at what happens when you, a neophyte web developer, first encounter ASP.NET. If you only use the controls provided in System.Web.dll, and write all of your own controls using the UserControl abstraction and learn HTML, you can create a half-decent web app using little more than your existing VB knowledge. That is, until you create a page that has a lot of cool stuff on it, at which point your 20K page turns out to weigh in at well over 500K. After a little research, you will find the culprit turns out to be ASP.NET ViewState. So you turn it off and your page is tiny again, but now none of you controls remember their values when buttons are clicked. You do some more research and learn that each control stores is value in ViewState to pass data across page loads, so you go through and determine whether you should enable or disable ViewState for each control. When you're done, even if you have something that mostly works, you have a big page instead of a huge page--and the abstractions designed to save you from learning about the web have merely forced you to learn the arcana of how ASP.NET tries and fails to hide the web, instead of writing a well-written web application from the beginning.

The remaining source of hugeness turns out to be a list control that you databound to a giant list of things from the database. The database can give you paged data, but the list control doesn't really like to be given anything but the full list--even if the control only shows a subset. At this point, you can either dig into the inner workings of the list control and try to trick it into doing what you want, or you can write your own list control. Either way you have a ton of work ahead of you.

The basic problem is that ASP.NET is not built on top of a windowing framework, but it tries its hardest to act like Windows Forms. So the abstraction breaks down whenever you run across the two big differences between web development and windows development: state management and how to display your controls--or in other words, basically the whole thing.

In an AJAX world, ASP.NET is even worse, because the idea of rendering only part of the page just does not map well into ASP.NET. Microsoft has an AJAX library that is built around the concept of an update panel control, which allows you to specify a chunk of the page that can be replaced via AJAX. This works for small things, but before too long, you put almost all of your page inside update panels. The end result is that each time you need do an AJAX request, almost the entire page is sent to the client and replaced via JavaScript, so even if only one control needs to be rerendered and replaced, all of them are.

ASP.NET MVC which throws away most of the ASP.NET infrastructure and builds up an MVC framework on top of the IHttpHandler interface (the lowest level part of ASP.NET). The MVC design pattern as applied to web design is inherently stateless, so even though the programmer is forced to actually think about state management, he writes cleaner, shorter code. The other thing it does right is to provide a set of extension methods that encourage the programmer to generate a bunch of html strings instead of an tree of controls.

In my experience, a good VB developer should have no problem learning to write good web apps, even ones that do neat AJAX tricks. Unfortunately, ASP.NET tends to prevent those programmers from being successful rather than help them.


Stefan Rusek | Using ‘var’ to become a better programmer

Using ‘var’ to become a better programmer

Tuesday, March 03, 2009

When C# 3 first came out, there were a number articles written about using ‘var’ vs. explicitly typed variables. Some notable people said to use types, some notable people said to use ‘var’, and others took a pragmatic stance somewhere in the middle. In case you missed all this, here is an example:

List<int> l1 = new List<int>();
var l2 = new List<int>();

System.Web.HttpContext context1 = System.Web.HttpContext.Current;
HttpContext context2 = HttpContext.Current;
var context3 = System.Web.HttpContext.Current;

In the code above, l1 and l2 have the same types, but l2’s type is inferred from the declaration. For the declaration of l1 we specify the type twice, while in the case of l2, we don’t have to repeat ourselves. In the case of context1, we might be tempted to add a ‘using’ statement so we can use a declaration like context2, but using ‘var’ for context3 gives us a way to get the length of context2, but without polluting our global namespace. After a lot of thought and coding both ways, I have fallen into the “always use ‘var’ (except when you just can’t)” camp, because writing easily readable code as a whole outweighs the cost of possibly writing a bad assignment statement.

Much of the debate between these two methods focuses on readability of the assignment statement itself. Duplicating the type name tends to make the line harder to read, but some argue that the line is harder to read when the type does not appear on the line at all.

My contention is that the person reading the code does not actually care to know the exact type of the variable. The reader really wants to know the logical type of the variable. In every language, variables have a logical type and an exact type. The logical type is the idea that the exact type tries to represent in memory. So in other words, if you saw a call to a method CountItems(), you would know that it returned an integer, but wouldn’t really care whether it returned an int, uint, long, etc. In fact, many times uint or long make a lot more sense for the result from a method that returns the number of things. In these cases, the logical type of the variable is completely obvious, but the exact type is not. Likewise, if you had a method named GetDeletedItems(), you would know that it returned some kind of list. Many times, it does not make a lot of difference what type of list it returns, as long as you can do ‘foreach’ on it.

string Message()
{
    var items = CountItems();
    if (items == 1)
        return "1 item";
    return items + " items";
}

string ItemsToJSON()
{
    var items = GetItems();
    var sb = new StringBuilder();
    foreach (var item in items)
    {
        if (sb.Length != 0) sb.Append(",");
        sb.Append(item.ToJSON());
    }
    return "[" + sb + "]";
}

GetItems() could return an ArrayList, List<Item>, IEnumerable, or some user-defined type. We simply don’t care. The only inportant fact about GetItems() is that it returns a bunch of items we can use with the ‘foreach’ statement. In both of the above methods we have a variable named items. In both contexts, we know the logical type of the variable, but not the exact type of variable. Two things that reveal this information to us: the methods CountItems() and GetItems(), and the way we use the variables, both tell us the logical type.

You may be asking yourself how this can make you a  better programmer. Well, when a variable’s logical type is not clear, changing  the ‘var’ to an explicit type will not fix the underlying problem with the code, because it will only make the exact type clear. The real problem with the code is one or  more of the following: a bad variable name, a bad method name, or the variable  is declared too far from where it is used.

A good variable name goes a long way toward making good code. It doesn’t have to make the exact type clear to the user, but it should be clear from the context what the logical type is. Good method names are even more important than good variable names. This is because many times the consumer does not have access to the source code of the method, and even when you do have access to the method’s source it can be a pain to look at the source in order to figure out what it does. It is not uncommon for a variable declaration to get separated from where it is used as code evolves. This is something that can easily be fixed, and can dramatically improve readability.

When I worked at Fog Creek, we had a naming convention for variables that completely took a different approach to solving the problem:

Dim ixUser = GetUserId()
Dim ixUser As Int32 = GetUserId()

Dim cUsers = CountAdminUsers()
Dim rgUers = GetAdminUsers()

The ix prefix says that it is a Int32 index column from the database, thus the “As Int32” in the second line is completely redundant. The c and rg prefixes are count and array, both of which imply a type as well. The emphasis was on the logical type instead of the exact type. It was not uncommon for ixUser to actually be a string representation of an integer.

In the majority of cases the actual type just does not mater. In fact, when we let go of the exact and embrace the logical type, we will end up writing cleaner more consistent code.


Stefan Rusek | LINQ – Beware of First() and Count()

LINQ – Beware of First() and Count()

Saturday, August 22, 2009

LINQ is one of my favorite things about C# 3. In fact, I rewrote almost all of the LINQ extension methods a few months ago, so I could use LINQ in a .NET v2 application. However, LINQ can also be a huge source of programmer error. There are a few extension methods that are particularly dangerous, but by far I’ve seen the most problems with the First() and the Count() methods especially when used with database based queries. The problem occurs because LINQ is IEnumerator based. LINQ doesn’t do anything magical for these two methods, so we can assume the default implementations look something like the following:

public T First<T>(this IEnumerable<T> src)
{
    foreach (T item in src)
        return item;
    throw new ItemNotFoundException();
}

public int Count<T>(this IEnumerable<T> src)
{
    int c = 0;
    foreach (T item in src)
        c++;
    return c;
}

The problem here is that both of these methods start the iteration over again. This means that the following code inadvertently results in 2 database hits when it finds a user:

public User GetUser(int userid)
{
    var q = GetUserQuery(userid);
    if (q.Count() > 0)
        return q.First();
    else
        return User.DefaultUser;
}

Most of the time we can fix this pretty easily by either using ToList() or FirstOrDefault(), which are implemented something like this:

public T FirstOrDefault<T>(this IEnumerable<T> src)
{
    foreach (T item in src)
        return item;
    return default(T);
}

public List<T> ToList<T>(this IEnumerable<T> src)
{
    return new List<T>(src);
}

And so GetUser() becomes one of the following:

public User GetUser1(int userid)
{
    var u = GetUserQuery(userid).ToList();
    if (u.Count > 0)
        return u[0];
    else
        return User.DefaultUser;
}

public User GetUser2(int userid)
{
    return GetUserQuery(userid).FirstOrDefault() ?? User.DefaultUser;
}