Thursday, June 19, 2008

Four Rules for Simple Codelines

Some of you may be aware of Kent Beck's Four Rules of Simple Code that state simple code:
  1. Correctly runs (and passes) all the tests
  2. Contains no duplication (OnceAndOnlyOnce and The DRY Principle)
  3. Clearly expresses all the ideas/intentions we needed to express (reveals all intent and intends all it reveals)
  4. Minimizes the number of classes and methods (has no superfluous parts)
(I've seen some boil this down into some of the same rules for writing clear prose: correct, consistent, clear, and concise.)

Lately I've been noticing some parallels to the above and rules for what I would call "simple codelines" and I think there may be a similar way of expressing them...

Simple codelines:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

To elaborate further...

Correctly build, run (and pass) all the tests

This is of course the most obvious and basic of necessities for any codeline. If the codeline (or the "build") is broken, then integration is basically blocked, and starting new work/changes for the codeline is hindered.

Contains no duplicate work/products

The same work and work-products should be done OnceAndOnlyOnce! Sometimes effort is spent more than once to introduce the same change/functionality. This is sometimes because of miscoordination, or simply lack of realization that what two different developers were working on required each of them doing some of the same things (and perhaps should have been accomplished in smaller chunks).

Other times, rather than modify or refactor a common file, some will simply copy-and-paste the contents of one or more files (or directories/folders) because they don't want to have to worry about reconciling what would otherwise be merges of concurrent changes to the common files.

This is akin to neglecting to refactor at the "physical" level (of files and folders) as opposed to the "logical" level of classes and methods. It adds more complexity and (over time) inconsistency to the set of artifacts and versions that make up the codeline, and also eventually adds to the time it takes to merge, build, and test any integrated changes.

If content is being added to the codeline, we want that content to have to be added only once, without any duplicate or redundant human effort.

Transparently contains all the changes we needed to make (and none of the ones we didn't)

The above is sometimes the cause of much undesirable additional effort that is imposed for the sake of attaining traceability and ensuring process compliance/enforcement. Here, I mean to focus on the ends rather than the means, and I say transparency rather than traceability for that very reason.

If people are working in a task-based and test-driven manner, it should be simple to report what changes have been made since a previous commit and that only intended tasks were worked-on and integrated.

If a codeline is truly simple, then it should be very simple and easy to reveal all the changes that went into it without adding a lot of overhead and constraints to development. It should be easy to tell which changes/tasks have been integrated and what functionality and tests they correspond to. One very simple and basic means of tying checkins (or "commits") to backlog-tasks and their tests can be found here; others are mentioned in this article.

Minimizes the number and length of sub-branches and unsynchronized work/changes

Branching can be a boon when used properly and miserly. It can also add a heck of a lot of complexity and redundancy for maintaining two or more evolving variants of the project. The additional effort to track and merge and build many of the same fixes and enhancements in multiple configurations can be staggering.

Sometimes such branches are useful or even necessary (and can help with what Lean calls nested synchronization and harmonic cadences). But they should be as few and as short-lived as possible, preferably living no longer than the time it takes to complete a fine-grained task or to integrate several fine-grained tasks.

Even when there are no sub-codelines of a branch, there can still be un-integrated (unsynchronized) work-in-progress in the form of long-lived or large-grained tasks with changes that have not yet been checked-in or synced-up with the codeline. Keeping tasks short-lived and fine-grained (e.g., on the order of minutes & hours instead of hours & days) helps ensure the codeline is continuously integrated and synchronized with all the work that is taking place.

Another (possibly less obvious form) of unsynchronized work is when there is a discrepancy between the latest version of code checked-in to the codeline, and the latest version of code that constitutes the "last good build." Developer's lives are "simpler" when the latest version of the codeline (the "tip") is the version they need to use to base new work off of, and to update their existing workspace (a.k.a. "sandbox").

When the latest "good" version of the codeline is not the same (less recent) than the latest version, it can be less obvious to developers which version to use and become less likely that they use/select it correctly. Some use "floating tags" or "floating labels" for this purpose where they "move" the LAST_GOOD_BUILD tag from its previous set of versions to the current set of versions for a newly passed/promoted build. Sometimes the developers always use this "tag" and never use the "tip" (except when they have to merge their changes to the codeline of course).

Even with floating tags however, it is still simpler and more desirable when the last good version IS the latest version. Even if the latest version is known to be "broken", the lag between "latest" and "last good" version of a codeline can be a source of waste and complexity in the effort required to build, verify and promote a version to be "good" (and can introduce more complexity when having to merge to "latest" if your work has only been synchronized with "last good").

Plus, this lag-time often leads many a development shop to separate merging (and integration & test) responsibilities between development and so called integrators/build-meisters, where the best developers can attempt is to sync-up their work with the "last good build" and then "submit" that work to a manually initiated build rather than being directly responsible for ensuring the task is "done done" by being fully integrated and passing all its tests.

Such separation often leads to territorial disputes between roles and build/merge responsibilities. This in turn often leads to adversarial (rather than cooperative and collaborative) relationships and isolated, compartmentalized (rather than shared) knowledge for the execution and success of those responsibilities.

So there we have it! Four rules of simple codelines.

Simple Codelines should:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

Sometimes there are legitimate reasons why some of the rules need to be bent, and there are important SCM patterns to know about in order to do it successfully. But any time you do that, it makes your codeline less simple. So you want those scenarios to be few and far between, and to keep striving for the goal of simplicity. (Other SCM patterns, such as Mainline, can help you refactor your codelines/branches to be more simple.)

No comments: