One of the things that might be surprising in the LINQ Distinct standard query operator is that it doesn’t automatically work properly on custom classes. There are reasons for this, which I’ll explain shortly. The example I’ll use in this post focuses on pulling a unique list of names to load into a drop-down list. I’ll explain the sample application, show you typical first shot at Distinct, explain why it won’t work as you expect, and then demonstrate a solution to make Distinct work with any custom class.
The technologies I’m using are LINQ to Twitter, LINQ to Objects, Telerik Extensions for ASP.NET MVC, ASP.NET MVC 2, and Visual Studio 2010.
The function of the example program is to show a list of people that I follow. In Twitter API vernacular, these people are called “Friends”; though I’ve never met most of them in real life. This is part of the ubiquitous language of social networking, and Twitter in particular, so you’ll see my objects named accordingly. Where Distinct comes into play is because I want to have a drop-down list with the names of the friends appearing in the list. Some friends are quite verbose, which means I can’t just extract names from each tweet and populate the drop-down; otherwise, I would end up with many duplicate names. Therefore, Distinct is the appropriate operator to eliminate the extra entries from my friends who tend to be enthusiastic tweeters. The sample doesn’t do anything with the drop-down list and I leave that up to imagination for what it’s practical purpose could be; perhaps a filter for the list if I only want to see a certain person’s tweets or maybe a quick list that I plan to combine with a TextBox and Button to reply to a friend. When the program runs, you’ll need to authenticate with Twitter, because I’m using OAuth (DotNetOpenAuth), for authentication, and then you’ll see the drop-down list of names above the grid with the most recent tweets from friends. Here’s what the application looks like when it runs:
As you can see, there is a drop-down list above the grid. The drop-down list is where most of the focus of this article will be. There is some description of the code before we talk about the Distinct operator, but we’ll get there soon.
This is an ASP.NET MVC2 application, written with VS 2010. Here’s the View that produces this screen:
<%@ Page Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage<TwitterFriendsViewModel>" %>
<%@ Import Namespace="DistinctSelectList.Models" %>
<asp:Content ID="Content1" ContentPlaceHolderID="TitleContent" runat="server">
Home Page
</asp:Content><asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">
<fieldset>
<legend>Twitter Friends</legend>
<div>
<%= Html.DropDownListFor(
twendVM => twendVM.FriendNames,
Model.FriendNames,
"<All Friends>") %>
</div>
<div>
<% Html.Telerik().Grid<TweetViewModel>(Model.Tweets)
.Name("TwitterFriendsGrid")
.Columns(cols =>
{
cols.Template(col =>
{ %>
<img src="<%= col.ImageUrl %>"
alt="<%= col.ScreenName %>" />
<% });
cols.Bound(col => col.ScreenName);
cols.Bound(col => col.Tweet);
})
.Render(); %>
</div>
</fieldset>
</asp:Content>
As shown above, the Grid is from Telerik’s Extensions for ASP.NET MVC. The first column is a template that renders the user’s Avatar from a URL provided by the Twitter query. Both the Grid and DropDownListFor display properties that are collections from a TwitterFriendsViewModel class, shown below:
using System.Collections.Generic;
using System.Web.Mvc;
namespace DistinctSelectList.Models
{
///
/// For finding friend info on screen
///
public class TwitterFriendsViewModel
{
///
/// Display names of friends in drop-down list
///
public List FriendNames { get; set; }
///
/// Display tweets in grid
///
public List Tweets { get; set; }
}
}
I created the TwitterFreindsViewModel. The two Lists are what the View consumes to populate the DropDownListFor and Grid. Notice that FriendNames is a List of SelectListItem, which is an MVC class. Another custom class I created is the TweetViewModel (the type of the Tweets List), shown below:
namespace DistinctSelectList.Models
{
///
/// Info on friend tweets
///
public class TweetViewModel
{
///
/// User's avatar
///
public string ImageUrl { get; set; }
///
/// User's Twitter name
///
public string ScreenName { get; set; }
///
/// Text containing user's tweet
///
public string Tweet { get; set; }
}
}
The initial Twitter query returns much more information than we need for our purposes and this a special class for displaying info in the View. Now you know about the View and how it’s constructed. Let’s look at the controller next.
The controller for this demo performs authentication, data retrieval, data manipulation, and view selection. I’ll skip the description of the authentication because it’s a normal part of using OAuth with LINQ to Twitter. Instead, we’ll drill down and focus on the Distinct operator. However, I’ll show you the entire controller, below, so that you can see how it all fits together:
using System.Linq;
using System.Web.Mvc;
using DistinctSelectList.Models;
using LinqToTwitter;
namespace DistinctSelectList.Controllers
{
[HandleError]
public class HomeController : Controller
{
private MvcOAuthAuthorization auth;
private TwitterContext twitterCtx;
///
/// Display a list of friends current tweets
///
///
public ActionResult Index()
{
auth = new MvcOAuthAuthorization(InMemoryTokenManager.Instance, InMemoryTokenManager.AccessToken);
string accessToken = auth.CompleteAuthorize();
if (accessToken != null)
{
InMemoryTokenManager.AccessToken = accessToken;
}
if (auth.CachedCredentialsAvailable)
{
auth.SignOn();
}
else
{
return auth.BeginAuthorize();
}
twitterCtx = new TwitterContext(auth);
var friendTweets =
(from tweet in twitterCtx.Status
where tweet.Type == StatusType.Friends
select new TweetViewModel
{
ImageUrl = tweet.User.ProfileImageUrl,
ScreenName = tweet.User.Identifier.ScreenName,
Tweet = tweet.Text
})
.ToList();
var friendNames =
(from tweet in friendTweets
select new SelectListItem
{
Text = tweet.ScreenName,
Value = tweet.ScreenName
})
.Distinct()
.ToList();
var twendsVM = new TwitterFriendsViewModel
{
Tweets = friendTweets,
FriendNames = friendNames
};
return View(twendsVM);
}
public ActionResult About()
{
return View();
}
}
}
The important part of the listing above are the LINQ to Twitter queries for friendTweets and friendNames. Both of these results are used in the subsequent population of the twendsVM instance that is passed to the view. Let’s dissect these two statements for clarification and focus on what is happening with Distinct.
The query for friendTweets gets a list of the 20 most recent tweets (as specified by the Twitter API for friend queries) and performs a projection into the custom TweetViewModel class, repeated below for your convenience:
var friendTweets =
(from tweet in twitterCtx.Status
where tweet.Type == StatusType.Friends
select new TweetViewModel
{
ImageUrl = tweet.User.ProfileImageUrl,
ScreenName = tweet.User.Identifier.ScreenName,
Tweet = tweet.Text
})
.ToList();
The LINQ to Twitter query above simplifies what we need to work with in the View and the reduces the amount of information we have to look at in subsequent queries. Given the friendTweets above, the next query performs another projection into an MVC SelectListItem, which is required for binding to the DropDownList. This brings us to the focus of this blog post, writing a correct query that uses the Distinct operator. The query below uses LINQ to Objects, querying the friendTweets collection to get friendNames:
var friendNames =
(from tweet in friendTweets
select new SelectListItem
{
Text = tweet.ScreenName,
Value = tweet.ScreenName
})
.Distinct()
.ToList();
The above implementation of Distinct seems normal, but it is deceptively incorrect. After running the query above, by executing the application, you’ll notice that the drop-down list contains many duplicates. This will send you back to the code scratching your head, but there’s a reason why this happens.
To understand the problem, we must examine how Distinct works in LINQ to Objects. Distinct has two overloads: one without parameters, as shown above, and another that takes a parameter of type IEqualityComparer<T>. In the case above, no parameters, Distinct will call EqualityComparer<T>.Default behind the scenes to make comparisons as it iterates through the list. You don’t have problems with the built-in types, such as string, int, DateTime, etc, because they all implement IEquatable<T>. However, many .NET Framework classes, such as SelectListItem, don’t implement IEquatable<T>. So, what happens is that EqualityComparer<T>.Default results in a call to Object.Equals, which performs reference equality on reference type objects. You don’t have this problem with value types because the default implementation of Object.Equals is bitwise equality. However, most of your projections that use Distinct are on classes, just like the SelectListItem used in this demo application. So, the reason why Distinct didn’t produce the results we wanted was because we used a type that doesn’t define its own equality and Distinct used the default reference equality. This resulted in all objects being included in the results because they are all separate instances in memory with unique references.
As you might have guessed, the solution to the problem is to use the second overload of Distinct that accepts an IEqualityComparer<T> instance. If you were projecting into your own custom type, you could make that type implement IEqualityComparer<T>, but SelectListItem belongs to the .NET Framework Class Library. Therefore, the solution is to create a custom type to implement IEqualityComparer<T>, as in the SelectListItemComparer class, shown below:
using System.Collections.Generic;
using System.Web.Mvc;
namespace DistinctSelectList.Models
{
public class SelectListItemComparer : EqualityComparer
{
public override bool Equals(SelectListItem x, SelectListItem y)
{
return x.Value.Equals(y.Value);
}
public override int GetHashCode(SelectListItem obj)
{
return obj.Value.GetHashCode();
}
}
}
The SelectListItemComparer class above doesn’t implement IEqualityComparer<SelectListItem>, but rather derives from EqualityComparer<SelectListItem>. Microsoft recommends this approach for consistency with the behavior of generic collection classes. However, if your custom type already derives from a base class, go ahead and implement IEqualityComparer<T>, which will still work.
EqualityComparer is an abstract class, that implements IEqualityComparer<T> with Equals and GetHashCode abstract methods. For the purposes of this application, the SelectListItem.Value property is sufficient to determine if two items are equal. Since SelectListItem.Value is type string, the code delegates equality to the string class. The code also delegates the GetHashCode operation to the string class.You might have other criteria in your own object and would need to define what it means for your object to be equal.
Now that we have an IEqualityComparer<SelectListItem>, let’s fix the problem. The code below modifies the query where we want distinct values:
var friendNames =
(from tweet in friendTweets
select new SelectListItem
{
Text = tweet.ScreenName,
Value = tweet.ScreenName
})
.Distinct(new SelectListItemComparer())
.ToList();
Notice how the code above passes a new instance of SelectListItemComparer as the parameter to the Distinct operator. Now, when you run the application, the drop-down list will behave as you expect, showing only a unique set of names.
In addition to Distinct, other LINQ Standard Query Operators have overloads that accept IEqualityComparer<T>’s, You can use the same techniques as shown here, with SelectListItemComparer, with those other operators as well. Now you know how to resolve problems with getting Distinct to work properly and also have a way to fix problems with other operators that require equality comparisons.
@JoeMayo