Thursday, June 26, 2008

Assigning Code Ownership-Policy Ownership

Jurgen Appelo has an interesting article on StickyMinds entitled "Code Ownership Re-Visited"

Jurgen prefers the term "artifact assignment" rather than "code ownership" and explains there are 4 methods of artifact assignment:
  1. Local Artifact Assignment (LAA) delegates policy to subsystems and subsubsystems (etc.)
  2. Authoritarian Artifact Assignment (AAA) assigns change/access-control access-control of of ALL related artifacts to a single individual "benevolent dictator" who approves/authorizes all changes and who may also assign individual change-tasks to developers
  3. Collective Artifact Assignment (CAA) assigns the whole team (rather than any one person) as collectively accountable for all its artifacts
  4. Individual Artifact Assignment (IAA)
He also provides a nice set of criteria to help decide which policy to use.

This seems very different from the 4 kinds of ownership I described in "Situational Code Ownership: Dynamically Balancing Individual -vs- Collective Ownership" where I define what amounts to Non-Ownership (which can sometimes be the result of Dictatorship), Individual Ownership, Stewardship, and Collective Ownership and show how each maps to a corresponding leadership-style of the Situational Leadership Model.

So what gives? What explains this difference?
  • I think Jurgen would probably consider Stewardship as a weak form of Individual ownership (many others would too, though I staunchly disagree for reasons elaborated in the aforementioned article).
  • Authoritarian Assignment would be akin to the form of Non-Ownership that results from Dictatorship (or "director-ship" to be more precise) where assignments are made per modification/tasks by a director (or "benevolent dictator")
  • I would argue that the first two methods Jurgen describes above aren't really artifact-assignment policies, but instead are assignment-ownership policies: They're not so much about making a decision among ownership-policies as they are about making a policy for ownership decisions. Rather than deciding "who should own which artifacts", they decide "who should own the decision" to make such artifact assignments.

In other words, the first two policies Jurgen defines are about decision-rights to assign modification-rights to owners, and not about the modification-rights (or ownership assignments) themselves. As such, it raises an important point taken for granted in my article and in so many other discussions on this topic. Most of the prior discussion probably has assumed that the decision about which ownership-policy to adopt was made either "by the team" or by the team's "leadership" (that might be a manager, a technical-lead, a project-lead, or any combination thereof).

Another common assumption is that such ownership is defined along "architectural" boundaries such as individual artifacts/files, classes/modules, or packages, components and subsystems. Other possibilities are:
  • Functional boundaries (e.g., by feature/feature-set, story/theme/epoch, or use-cases)
  • Project boundaries (e.g., work-breakdown-structures, tasks, activities)
  • Role/discipline specialization boundaries (e.g., requirements, tests, models, database, UI, programming-language, user/help documentation, etc.), and even ...
  • Configuration/variation boundaries (e.g., version, variant, branch, platform). In fact some of these stretch across multiple dimensions of a project/product and might even be used in combination.
With Agile development, the emphasis is to break-down communication boundaries and any corresponding separation related to role, or phase, or physical boundaries and to instead prefer customer-determined boundaries of "scope" and "deliverable value" (e.g., stories, features or MMFs, use-cases, etc.).

So you will see a definite (but time-constrained) assigning of things like tasks to owners and stories (though the "owners" sign-up rather than "being assigned"). Those kinds of boundaries encourage closer and more frequent communication rather than separate & isolated (and less frequent) communication.

In the end, with Agile methods, it's all about maximizing learning and knowledge sharing & transfer rather than compartmentalizing knowledge into pigeon-holed roles and responsibilities. One opts for "separation of concerns" without "separation of those concerned" (work is isolated and separated, but not people).

Thursday, June 19, 2008

Four Rules for Simple Codelines

Some of you may be aware of Kent Beck's Four Rules of Simple Code that state simple code:
  1. Correctly runs (and passes) all the tests
  2. Contains no duplication (OnceAndOnlyOnce and The DRY Principle)
  3. Clearly expresses all the ideas/intentions we needed to express (reveals all intent and intends all it reveals)
  4. Minimizes the number of classes and methods (has no superfluous parts)
(I've seen some boil this down into some of the same rules for writing clear prose: correct, consistent, clear, and concise.)

Lately I've been noticing some parallels to the above and rules for what I would call "simple codelines" and I think there may be a similar way of expressing them...

Simple codelines:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

To elaborate further...

Correctly build, run (and pass) all the tests

This is of course the most obvious and basic of necessities for any codeline. If the codeline (or the "build") is broken, then integration is basically blocked, and starting new work/changes for the codeline is hindered.

Contains no duplicate work/products

The same work and work-products should be done OnceAndOnlyOnce! Sometimes effort is spent more than once to introduce the same change/functionality. This is sometimes because of miscoordination, or simply lack of realization that what two different developers were working on required each of them doing some of the same things (and perhaps should have been accomplished in smaller chunks).

Other times, rather than modify or refactor a common file, some will simply copy-and-paste the contents of one or more files (or directories/folders) because they don't want to have to worry about reconciling what would otherwise be merges of concurrent changes to the common files.

This is akin to neglecting to refactor at the "physical" level (of files and folders) as opposed to the "logical" level of classes and methods. It adds more complexity and (over time) inconsistency to the set of artifacts and versions that make up the codeline, and also eventually adds to the time it takes to merge, build, and test any integrated changes.

If content is being added to the codeline, we want that content to have to be added only once, without any duplicate or redundant human effort.

Transparently contains all the changes we needed to make (and none of the ones we didn't)

The above is sometimes the cause of much undesirable additional effort that is imposed for the sake of attaining traceability and ensuring process compliance/enforcement. Here, I mean to focus on the ends rather than the means, and I say transparency rather than traceability for that very reason.

If people are working in a task-based and test-driven manner, it should be simple to report what changes have been made since a previous commit and that only intended tasks were worked-on and integrated.

If a codeline is truly simple, then it should be very simple and easy to reveal all the changes that went into it without adding a lot of overhead and constraints to development. It should be easy to tell which changes/tasks have been integrated and what functionality and tests they correspond to. One very simple and basic means of tying checkins (or "commits") to backlog-tasks and their tests can be found here; others are mentioned in this article.

Minimizes the number and length of sub-branches and unsynchronized work/changes

Branching can be a boon when used properly and miserly. It can also add a heck of a lot of complexity and redundancy for maintaining two or more evolving variants of the project. The additional effort to track and merge and build many of the same fixes and enhancements in multiple configurations can be staggering.

Sometimes such branches are useful or even necessary (and can help with what Lean calls nested synchronization and harmonic cadences). But they should be as few and as short-lived as possible, preferably living no longer than the time it takes to complete a fine-grained task or to integrate several fine-grained tasks.

Even when there are no sub-codelines of a branch, there can still be un-integrated (unsynchronized) work-in-progress in the form of long-lived or large-grained tasks with changes that have not yet been checked-in or synced-up with the codeline. Keeping tasks short-lived and fine-grained (e.g., on the order of minutes & hours instead of hours & days) helps ensure the codeline is continuously integrated and synchronized with all the work that is taking place.

Another (possibly less obvious form) of unsynchronized work is when there is a discrepancy between the latest version of code checked-in to the codeline, and the latest version of code that constitutes the "last good build." Developer's lives are "simpler" when the latest version of the codeline (the "tip") is the version they need to use to base new work off of, and to update their existing workspace (a.k.a. "sandbox").

When the latest "good" version of the codeline is not the same (less recent) than the latest version, it can be less obvious to developers which version to use and become less likely that they use/select it correctly. Some use "floating tags" or "floating labels" for this purpose where they "move" the LAST_GOOD_BUILD tag from its previous set of versions to the current set of versions for a newly passed/promoted build. Sometimes the developers always use this "tag" and never use the "tip" (except when they have to merge their changes to the codeline of course).

Even with floating tags however, it is still simpler and more desirable when the last good version IS the latest version. Even if the latest version is known to be "broken", the lag between "latest" and "last good" version of a codeline can be a source of waste and complexity in the effort required to build, verify and promote a version to be "good" (and can introduce more complexity when having to merge to "latest" if your work has only been synchronized with "last good").

Plus, this lag-time often leads many a development shop to separate merging (and integration & test) responsibilities between development and so called integrators/build-meisters, where the best developers can attempt is to sync-up their work with the "last good build" and then "submit" that work to a manually initiated build rather than being directly responsible for ensuring the task is "done done" by being fully integrated and passing all its tests.

Such separation often leads to territorial disputes between roles and build/merge responsibilities. This in turn often leads to adversarial (rather than cooperative and collaborative) relationships and isolated, compartmentalized (rather than shared) knowledge for the execution and success of those responsibilities.

So there we have it! Four rules of simple codelines.

Simple Codelines should:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

Sometimes there are legitimate reasons why some of the rules need to be bent, and there are important SCM patterns to know about in order to do it successfully. But any time you do that, it makes your codeline less simple. So you want those scenarios to be few and far between, and to keep striving for the goal of simplicity. (Other SCM patterns, such as Mainline, can help you refactor your codelines/branches to be more simple.)

Thursday, June 12, 2008

Traceability Matrix in an Agile Project

InfoQ.com summarized an email-list discussion thread on the subject of using a Traceability Matrix in an Agile Project.

I contributed quite a lot to the thread, and InfoQ apparently included many of the key things I said along with the related URLs to articles I've written. (Thanks guys!)

Sunday, June 08, 2008

Iterative and Incremental redefined redux

The agile community has written much about this in the past year or so:
Apologies in advance for being a "stick in the mud" on this one - I'm not particularly happy with the definitions so far. I searched around some more on the WWW and came across one I like a lot that I think better meets our needs.

It is from the paper What is Iterative Development? (part 1), by Ian Spence and Kurt Bittner,
Iterative and Incremental Development:
A style of development that involves the iterative application of a set of activities to evaluate a set of assertions, resolve a set of risks, accomplish a set of development objectives, and incrementally produce and refine an effective solution:
  • It is iterative in that it involves the successive refinement of the understanding of the problem, the solution's definition, and the solution's implementation by the repetitive application of the core development activities.2

  • It is incremental in that each pass through the iterative cycle grows the understanding of the problem and the capability offered by the solution.

  • Several or more applications of the iterative cycle are sequentially arranged to compose a project.
Sadly, development can be iterative without being incremental. For example, the activities can be applied over and over again in an iterative fashion without growing the understanding of the problem or the extent of the solution, in effect leaving the project where it was before the iteration started.

It can also be incremental without being truly iterative. For example, the development of a large solution can be broken up into a number of increments without the repetitive application of the core development activities.

To be truly effective the development must be both iterative and incremental. The need for iterative and incremental development arises out of the need to predictably deliver results in an uncertain world. Since we cannot wish the uncertainty away, we need a technique to master it. Iterative and incremental development provides us with a technique that enables us to master this uncertainty, or at least to systematically bring it sufficiently under control to achieve our desired results.

I like that this definition separated iterative from incremental and then defines them together. I would summarize it as follows (but I like the above better, even if it is longer):
Iterative development is the cyclical process of repeating a set of development activities to progressively elaborate and refine a complete solution. The “unit” of iterative development is an “iteration”, which represents one complete cycle through the set of activities.

Incremental development
is the process of developing and integrating the parts of a system in multiple stages, where each stage implements a working, executable subset of the final system. The “unit” of incremental development is an “increment”, which represents the executable subset of the system resulting from a particular stage

Iterative and Incremental development
is
therefore ...
the application of an iterative development lifecycle to successively develop and refine working, executable subsets (increments) of a solution that evolves incrementally (from iteration to iteration) into the final product.
  • Each iteration successively elaborates and refines the understanding of the problem, and of the solution's definition & implementation by learning and adapting to feedback from the previous iterations of the core development lifecycle (analysis, design, implementation & test).
  • Each increment successively elaborates and refines the capability offered by the solution in the form of tangible working results that can be demonstrated to stakeholders for evaluation.
An Agile Iteration is a planned, time-boxed interval (typically measured in weeks) whose output is a working result that can be demonstrated to stakeholders:

  • Agile Iterations focus the whole team on collaborating and communicating effectively for the purpose of rapidly delivering incremental value to stakeholders in a predictable fashion.
  • After each iteration, the resulting feedback and data can be examined, and project scope & priorities can be re-evaluated to adapt the project's overall performance and optimize its return-on-investment
So in addition to the non-agile-specific definitions above, we see that Agile iterations are adaptive, in that they use the previous results and feedback to learn, adjust and recalibrate for the next iteration. And Agile increments are tangible, in that they can be executed and made accessible to stakeholders for demonstration and evaluation.

That's my story and I'm sticking to it!

Monday, June 02, 2008

The Laws of Codeline (Thermo)Dynamics

Some of the discussion with my co-authors on our May 2008 CM Journal article on Agile Release Management spurred some additional thoughts by me that I hope to refine and work into a subsequent article later this year.

Release Management is about so much more than just the code/codeline (and it being "shippable") it's not even funny. Some other articles to reference and mention some key points from are:
Kevin Lee has written some GREAT stuff on Release Management that relates to Agile. The best is from the first and last chapters of his book on "The Java™ Developer's Guide to Accelerating and Automating the Build Process" but bits of pieces of it can also be found at:

ANY discussion about Release Management also needs to acknowledge that there is no single "the codeline", not just because I may have different codelines (Development-Line plus Release-Line) working toward the same product-release, but ESPECIALLY because no matter how Agile you are, the reality is that you will typically need to support MULTIPLE releases at the same time (at the very least the latest released version and the current under development version, but often even Agile projects need to support more than one release in the field)

So, when dealing with multiple release-line, and any "active development lines" for each of those, and the overall mainline, we really should say something overall about how to manage this "big picture" of all these codelines across multiple releases and iterations:
  • What is the relationship between development line, release-line and release-prep codeline?
  • How do the above three "lines" relate to "mainline"
  • What is the relationship between the different release-lines for the different supported releases
  • What is the overall relationship between the mainline and the release-lines (and if the mainline is also a release-line, which release is it?)
The above questions and the ability to give some big picture "advice" on relating it all together (the stuff of pattern languages) is precisely where Laura Wingerd's writing on "channeling the flow of change" and her chapter on "How Software Evolves" fits in! It tells us
  • The overall Mainline model
  • The different types of codelines ("line" patterns), and what kinds of builds take place on each of them
  • The relationships of those to the mainline
  • When+Why to branch (and from which "types" of codelines)
  • When+Why to merge across codelines (as a general rule)

These are where Laura's rules for "the flow of change" apply. And her concept of "change flow" is very much applicable to the Lean/Agile concept of "flow of value". The Tofu scale and "change flow" rules/protocol have to do with order+flow of codeline policies across the entire branching structure when it comes to making decisions about stability -vs- speed. One codeline's policy might make a certain tradeoff, but it is the presence of multiple codelines and how they work together, and how their policies define the overall flow of change across codelines, that forms the "putting it all together" advice that is key to release management across multiple releases+codelines.

In some way's you could make an overall analogy to the Laws or Thermodynamics and the realities of codeline management. Software and codelines tend, over time, to grow more complex and, if unchecked,
"Entropy" (instability) quickly becomes the most dominating force to contend with in their maintenance. See

The "entropy" (instability) doesnt just happen within a codeline. It can actually get far more hideous when it happens across codelines via indiscriminate branching from, or merging to, other codelines. This is what happens when you don't respect the principles and rules of "change flow" (from Wingerd) which ultimately stem from the rules of smooth and steady (value-stream) flow from Lean.

The Laws of Thermodynamics are about energy, entropy, and enthalpy. In the case of release management and codelines ...
  • energy relates to effort & productivity
  • entropy relates to stability/quality versus complexity
  • enthalpy relates to "order" (i.e., in the sense of structure and architecture as Christopher Alexander uses the term "order"). It is the "inverse" of entropy.

We could call them them "Laws of Codeline Dynamics" :-)

Energy misspent degrades flow, creates waste, and hurts productivity/velocity. In traditional development, we often see "fixed scope" with resources and schedule having to vary in order to meet the "scope" constraint. IN Agile development we deliberately "flip" that triangle upside down (see the picture in the article at here under the title "The Biggest Change: Scope versus Schedule - Schedule Wins"). So we are fixing "resources" and "schedule" and allowing scope to vary.

This might be one way of viewing the law of conservation of energy. If we fix resources and time (and insist on "sustainable pace" or "40hr work week") then we're basically putting in the same amount of effort over that time-box, but the key difference is how much of that effort results in "giving off energy" in the form of waste ("heat" or "friction") versus how much of that energy directly adds value. Both "Value" and "Enthalpy" degrade or depreciate over time, and adding more energy (effort) doesnt necessarily mean value is increased.

To make sure that energy goes toward adding value (and minimizing waste) we need to focus on the flow of value, and hence the flow of change/efforts to create value (the latter is one reasonable definition of a "codeline" or a "workstream"). to ensure a smooth, steady, and regular/frequent flow, there are certain rules we need to impose and regulate stability within and across codelines to better manage all those releases.

Zeroth Law of Thermodynamics (from Wikipedia)
- If two thermodynamic systems are each in thermal equilibrium with a third, then they are in thermal equilibrium with each other.
Translation to codelines ... this law of "thermal equilibrium" is a law of "codeline equilibrium" of sorts. (Does this mean If two codelines are are "in equlibrium" with a third codeline, then they are "in sync"? and with each other? here "in sync" doesnt mean they have the same frequency, it means their is some synchronization pattern regarding their relative stability and velocity. In Lean Terms, this would refer to "nested synchronization" and "harmonic cadence"). This might imply the "mainline" rule/pattern or one of Wingerd's rules of change-flow.

First Law of Thermodynamics
- In any process, the total energy of the universe remains the same.
This is the statement of conservation of energy for a thermodynamic system. It refers to the two ways that a closed system transfers energy to and from its surroundings - by the process of heating (or cooling) and the process of mechanical work.

This relates to effort & changes expended resulting in the creation of value and/or the creation of waste. We have activities that add value (which we hope is development), activities that preserve value (which is what much of SCM attempts do, given that it doesnt directly create the changes, but tries to ensure that changes happen and are built/integrated with minimal loss of energy/productivity/quality), and then we have activities (or portions of activities) that create waste (and increase entropy rather than preserving or increasing enthalpy/order)

Second Law of Thermodynamics
- In any isolated system that is not in equilibrium, entropy will increase over time
So this is the law of increasing instability/complexity/disorder. The "key" to preventing this from happening is achieving and then maintaining/preserving "equilibrium". How do we achieve such equlibrium? we do it with the "release enabler" patterns for codeline management (which help ensure "nested synchronization" and "harmonic cadence" in addition to achieving a balance or equilibrium between stability and velocity (to smooth out flow).

Third Law of Thermodynamics
- As temperature approaches absolute zero, the entropy of a system
approaches a constant minimum.
In our case, "Temperature" could be regarded as a measure of "energy" or "activity". As the energy/activity of a codeline approaches zero (such as a release in the field that youve been supporting and would LOVE to be able to retire that codeline sometime real soon), it's instability approaches a constant minimum.

This is perhaps another more polite way of saying something we already said in our article on "The Unchangeable Rules of Software Change", namely that "absolute stability" means dead (as in, "no activity"), and should serve as a reminder that our goals is not the prevention of change in order to achieve some ideal "absolute stability", for such an absolute would mean the project not just "done" but "dead".

On the other hand, it also speaks to us as a guideline for when it is safe to retire old codelines, and when to change their policy in accordance with their "energy level"