Most efficient algorithm for merging sorted IEnumerable<T>

Posted by franck on Stack Overflow See other posts from Stack Overflow or by franck
Published on 2010-05-04T16:15:33Z Indexed on 2010/05/04 16:18 UTC
Read the original article Hit count: 498

Filed under:
|
|
|
|

Hello,

I have several huge sorted enumerable sequences that I want to merge. Theses lists are manipulated as IEnumerable but are already sorted. Since input lists are sorted, it should be possible to merge them in one trip, without re-sorting anything.

I would like to keep the defered execution behavior.

I tried to write a naive algorithm which do that (see below). However, it looks pretty ugly and I'm sure it can be optimized. It may exist a more academical algorithm...

IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists, 
                                            Func<T, TOrder> orderBy)
{
    var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T));
    IEnumerator<T> tag = null;

    var firstRun = true;
    while (true)
    {
        var toRemove = new List<IEnumerator<T>>();
        var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>();
        foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key))
        {
            if (pair.Key.MoveNext())
                toAdd.Add(pair);
            else
                toRemove.Add(pair.Key);
        }

        foreach (var enumerator in toRemove)
            enumerators.Remove(enumerator);

        foreach (var pair in toAdd)
            enumerators[pair.Key] = pair.Key.Current;

        if (enumerators.Count == 0)
            yield break;

        var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
        tag = min.Key;
        yield return min.Value;

        firstRun = false;
    }
}

The method can be used like that:

// Person lists are already sorted by age
MergeOrderedLists(orderedList, p => p.Age);

assuming the following Person class exists somewhere:

    public class Person
    {
        public int Age { get; set; }
    }

Duplicates should be conserved, we don't care about their order in the new sequence. Do you see any obvious optimization I could use?

© Stack Overflow or respective owner

Related posts about c#

Related posts about optimization