Thinking in Sequences

When dealing with series of objects, it is easy to think of them as lists, or arrays. That is, after all, the first collection types most people get acquainted with when becoming a programmer. And by all means, lists are very versatile and easy to employ in most situations. However, they often provide more functionality than you strictly need. Rather than considering collections of objects as lists, I find it helpful to think of them as sequences.

Since version 1.0, the .NET Framework has provided the IEnumerable interface for iterating over a sequence of objects.  These days, its generic cousin, IEnumerable<T>, introduced in version 2.0, is often preferred. Initially, those interfaces existed mainly to support the foreach statement. However, with the advent of LINQ to Objects, IEnumerable got a morale boost. Suddenly, anyone could easily create advanced queries against any sequence of objects.

In my opinion, the main advantage of IEnumerable is its stream-based nature. Like with streams, the first items of an IEnumerable sequence can be made available for processing without having to collect every single item first. A sequences does not even have to be finite. Consider, for instance, the following code, returning an infinite sequence of all positive integers.

public static IEnumerable<int> EnumerateAllPositiveIntegers()
{
	int integer = 0;
	while(true)
	{
		integer++;
		yield return integer;
	}
}

Note the yield return statement. This is a very handy shortcut which C# provides for creating IEnumerable sequences. Simply yield return every item you wish to include in the sequence. Those who are unfamiliar with such iterator blocks in C# may suspect that the code above will lead to an infinite loop. However, every yield statement will lead to the enumerator’s MoveNext method returning, giving control back to the client object. Of course, if the client iterates the sequence using a regular foreach loop expecting the sequence to terminate, an infinite loop will in fact be the result.

The following code shows how the integer-generating method above can be used in a LINQ query, without risking an infinite loop.

IEnumerable<int> firstFiftyOddPositiveIntegers = EnumerateAllPositiveIntegers()
	.Where(num => num % 2 == 1)
	.Take(50);

Rather than generating an infinite number of integers, consider an operation which needs to complete some time-consuming task for every value it returns. In such cases, exposing the results through an IEnumerable sequence using yield return will allow the client to process each value without having to wait for every value to be produced, collected and returned.

So, when are sequences preferable over lists and other collections? Obviously, if a series of items is potentially infinite, a sequence has to be used; it cannot be represented as a list or collection. Generally, sequences are intended for items to simply be iterated over, while lists are collections you can add items to and remove items from. My rule of thumb is to expose IEnumerable sequences whenever feasible, relying on LINQ to convert the sequence into lists or arrays, should it be necessary.

By thinking in sequences, LINQ makes even more sense than before, and it also makes it easier to spot those situations where a sequence is not sufficient for the task.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s