I hadn’t done much (read: anything) with the C# generic HashSet until I recently needed to produce a distinct collection. As it turns out, HashSet<T> was the perfect tool.
As the following snippet demonstrates, this collection type offers a lot:
// Using HashSet<T>:
// http://www.albahari.com/nutshell/ch07.aspx
var letters = new HashSet<char>("the quick brown fox");
Console.WriteLine(letters.Contains('t')); // true
Console.WriteLine(letters.Contains('j')); // false
foreach (char c in letters) Console.Write(c); // the quickbrownfx
Console.WriteLine();
letters = new HashSet<char>("the quick brown fox");
letters.IntersectWith("aeiou");
foreach (char c in letters) Console.Write(c); // euio
Console.WriteLine();
letters = new HashSet<char>("the quick brown fox");
letters.ExceptWith("aeiou");
foreach (char c in letters) Console.Write(c); // th qckbrwnfx
Console.WriteLine();
letters = new HashSet<char>("the quick brown fox");
letters.SymmetricExceptWith("the lazy brown fox");
foreach (char c in letters) Console.Write(c); // quicklazy
Console.WriteLine();
The MSDN documentation is a bit light on HashSet<T> documentation but if you search hard enough you can find some interesting information and benchmarks.
But back to that distinct list I needed…
// MSDN Add
// http://msdn.microsoft.com/en-us/library/bb353005.aspx
var employeeA = new Employee {Id = 1, Name = "Employee A"};
var employeeB = new Employee {Id = 2, Name = "Employee B"};
var employeeC = new Employee {Id = 3, Name = "Employee C"};
var employeeD = new Employee {Id = 4, Name = "Employee D"};
var naughty = new List<Employee> {employeeA};
var nice = new List<Employee> {employeeB, employeeC};
var employees = new HashSet<Employee>();
naughty.ForEach(x => employees.Add(x));
nice.ForEach(x => employees.Add(x));
foreach (Employee e in employees) Console.WriteLine(e);
// Returns Employee A Employee B Employee C
The Add Method returns true on success and, you guessed it, false if the item couldn’t be added to the collection. I’m using the Linq ForEach syntax to add all valid items to the employees HashSet. It works really great.
This is just a rough sample, but you may have noticed I’m using Employee, a reference type. Most samples demonstrate the power of the HashSet with a collection of integers which is kind of cheating. With value types you don’t have to worry about defining your own equality members. With reference types, you do.
internal class Employee
{
public int Id { get; set; }
public string Name { get; set; }
public override string ToString()
{
return Name;
}
public bool Equals(Employee other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Id == Id;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Employee)) return false;
return Equals((Employee) obj);
}
public override int GetHashCode()
{
return Id;
}
public static bool operator ==(Employee left, Employee right)
{
return Equals(left, right);
}
public static bool operator !=(Employee left, Employee right)
{
return !Equals(left, right);
}
}
Fortunately, with Resharper, it’s a snap. Click on the class name, ALT+INS and then follow with the handy dialogues.
That’s it. Try out the HashSet<T>. It’s good stuff.