Tuesday, August 4, 2009

Thoughts about C# regions

I was playing with my toy parser over the weekend, and was deciding what the syntax to handle large blobs of arbitrary text should look like (for dealing with heredocs, wysiwyg strings, comment blocks, that sort of stuff). I was just about settling for a mix of ideas from Python and some ramblings from one of my older blog posts, but then it occurred to me that there is one type of comment that is unique, implemented in a mainstream language, and that has even created heated debates in some corners of the 'tubes: C# regions.

The debate is that regions are considered by some programmers to be evil. If you're not familiar with the debate, this article presents the point of view that regions are bad, and this other one presents the view that regions are helpful.

My thoughts

Regions do two things:

  1. They create code folding points so that the editor can hide code
  2. They allow those sections to be briefly described by snippet of plain english

I can see why people would argue about the code folding, but at the end of the day, whether you fold your code or not is a personal preference. I'm actually a bit surprised that people would resort to proclaiming the feature evil before, you know, finding that there's a way to turn off code folding in Visual Studio. You'd think software developers would be good at using software. I digress.

The problem with regions that no one seems to talk about (in my opinion) is that the regions aren't usually very informative. They do give me some information, but not a whole lot. Let me elaborate.

Generally, I see regions being used to separate different types of class members: methods, properties, events. Or publics and privates. I suppose that is fine when the code does tend to be event-driven (and a lot of the .NET API is). Spend enough time with that convention and you'll subconsciously learn that you'll find the scope of stateful data within a "Member Variables" region, and that code entry points are likely to be in the "Constructor" or "Events" regions.

My gripe is that this isn't normally written out explicitly in idiomatic #region usage. You have to have this sort of mind map.

Another unfortunate thing is that I find inconsistent organization sometimes: I'd open a region to find this huge list of folded methods. Which ones are top-level functions? Which ones are helper/shared/refactored functions? Which ones are called within tight loops? From which functions? Are there more methods hidden in an almost-invisible region sitting just below the fold in my editor viewport? Why aren't the rest of the methods grouped inside a sibling region as well in that case?

Back to Earth

Now, obviously, a lot of that of that is largely a matter of style, personal preferences and knowing the conventions, but I feel there's still some aspects of regions that are worth exploring.

The debate I mentioned earlier briefly says that methods should be refactored when they are too long. But what does that actually mean in terms of code folding?

By definition, refactoring a large block of procedural code yields a bunch of small functions. Heck, the toy parser I'm working on does exactly that.

The caveat is that refactored-out functions normally have a hierarchical relationship to the function from which they were extracted from.

Imagine we have a parser with a parse function that can call parseNumber and parseString. The function parseString in turn can call parseEscapedString and parseHeredoc, and the parseNumber can call parseHexadecimal and parseImaginary.

So far we have 7 functions and the logical way to group them looks more like a binary tree than a flat array of functions. We could create a "string" region and a "number" region, but that essentially leaves the parse function floating naked by itself in a sea of meta information:

#region parsing functions
public void parse() ...
  #region string parsing
public void parseString() ...
public void parseEscapedString() ...
public void parseHeredoc() ...
  #endregion

  #region number parsing
public void parseNumber() ...
public void parseHexadecimal() ...
public void parseImaginary() ...
  #endregion
#endregion

Which would fold like this:

#region parse
public void parse() ... <- this looks ugly :(
  [+ string]
  [+ number]
#endregion

We could put that parse function in a region of its own, but that doesn't make a whole lot of sense either. Hmmm.

It'd be neat if we could write this:

#region parsing functions
  #desc entry point
public void parse() ...
  #region string parsing
public void parseString() ...
public void parseEscapedString() ...
public void parseHeredoc() ...
  #endregion

  #region number parsing
public void parseNumber() ...
public void parseHexadecimal() ...
public void parseImaginary() ...
  #endregion
#endregion

Then, this made-up #desc would fold the parse function like so:

#region parsing functions
  [+ entry point]
  [+ string parsing]
  [+ number parsing]
#endregion

And while we're at it, let's go a step further. Let's add #desc's to all the functions and allow a folding mode that looks like this:

[+ parsing functions]
  [+ entry point]
  [+ string parsing]
    [+ strings w/ escape chars]
    [+ heredocs]
  [+ number parsing]
    [+ hexadecimals]
    [+ imaginary]

That would look a lot cleaner. It basically looks like a table of contents. Maybe I'll do something like that for my toy parser.

No comments:

Post a Comment