Saturday, August 27, 2005

The Baseline Immutability Principle

Adding more baselining principles to my Principles of SCM. So far I've described the Baseline Reproducibility Principle (BLREP) and the Baseline Identification Principle (BLIDP). Now I want to describe the Baseline Immutability Principle (BLIMP).

The Baseline Immutability Principle (BLIMP) is really just a rephrasing of The Open-Closed Principle (OCP) from The Principles of Object-Oriented Design as applied to baselines (baselined configurations). The OCP (first stated by Bertrand Meyer in the classic book Object-Oriented Software Construction) states that "Software entities (classes, modules, functions, etc.) should be open for extension but closed for modification."

The OCP means I should have a way of being able to extend a thing without changing the thing itself. Instead I should be able to create some new "thing" of my own that reuses the existing thing and somehow combines that with just my additions, resulting in an operational "extension" of the original thing. The OCP is the basis for letting me reuse rather than reinvent when I need to create something that is "like" an existing thing but which still requires some additional stuff.

If applied for baselined configurations (a.k.a. baselines) the OCP would read "A baseline should be open for extension but closed for modification." That means if I want to create a "new" configuration that extends the previously baselined configuration, I should do so by creating a new configuration that is the baseline PLUS my changes. The result is not a "changed" baseline - the baselined configuration stays the same as it was before my change. We don't actually ever "change" a baseline. What we do is request/apply one or more changes against/to a baseline; and the result is a new configuration, possibly resulting in a new baseline.

According to the Baseline Immutability Principle ...
    If a baseline is to be reproducible, and if it needs to be identifiable, then the name that identifies the baseline with its corresponding configuration must always refer the exact same configuration: the one that was released/baselined.
For example, suppose I have release 1.2 of my product and I apply a label/tag of "REL-1.2" to everything that was used to make 1.2 (not just the code, but ALL of it: requirements, designs, tests, make/ANT files, etc.). Suppose that version 1.2.3.4 of element FUBAR was one of the file revisions that was labeled. Now suppose that during the following month, "REL-1.2" is moved/reapplied to version 1.2.3.5 of FUBAR.

In this example, I have just violated the baseline immutability principle. If a customer needs me to be able to reproduce Release 1.2, and if Release 1.2 contained v1.2.3.4 of FUBAR, then if I use "REL-1.2" to recreate the state of the codebase for Release 1.2, I just got the wrong result, because the version of FUBAR in Release 1.2 is different from the version that is tagged with the "REL-1.2" label.

Notice that I am not saying that we can't make changes against a baseline. We most certainly can. And the result is a new configuration!
    When we make a change to a baseline, we aren't really changing the configuration that was baselined and then trying to use the same name for the result. Our changed result is a new configuration that took the current baseline and added our changes to it. And if we chose to name this new configuration, we give it a new name (one that is different from the name of any previously baselined configuration).
So a baseline name and the configuration it references are married: once the configuration is baselined, that name must forever after be faithfully monogamous to that configuration for better or for worse, for richer or for poorer, in sickness and in health for as long as they both shall live.

Always and forever? What about a divorce, or an anullment?
    An "anullment" in this case is when I didnt get it right the first time. Either I "blessed" a configuration as "baselined" that didnt really meet the criteria to be called a "baseline." Or else I incorrectly identified the corresponding configuration: I might have labeled the wrong version of a file, or I forgot to label some file (e.g., people often forget to label their makefiles), or I labeled something I shouldnt have.

    Correcting a baseline's labeled-set so that it accurately identifies ("tags") the baselined configuration isnt really changing the baseline; it's merely correcting the identification of it (because it was wrong up until then).

    What about a "divorce"? We all know that a divorce can be quite expensive, and require making payments for a long time thereafter. Retiring (and trying to reuse) a baseline name can have significant business impact. Retiring the baseline often means no longer providing support for that version of the product. Trying to then reuse the same baseline name of the same product for a new configuration can create lots of costly confusion and can even be downright misleading.

Note that the term "a baseline" should not be confused with the term "the baseline":
  • The term "the baseline" really means the latest/current baseline. It is a reference!

  • This means that "the baseline" is really just shorthand for "the latest baseline." And when we "change the baseline", we are changing the designation of which baseline is considered "latest": we are changing the reference named "latest baseline" to point to a newer configuration.
So The Baseline Immutability Principle states that once a configuration is baselined, the identification of the baseline name with its corresponding configuration is immutable: The set of elements (e.g., files and revisions) referenced by the baseline name must always be the same set. And that set must always correspond to the set that was used to produce the version of the product that was baselined.

I think this may be equivalent to Damon Poole's "TimeSafe Property" -- see Damon's paper The TimeSafe Property: a Formal Statement of Immutability for CM.

Let me know what you think!

Sunday, August 21, 2005

The Baseline Identification Principle

Yesterday (actually just a few hours ago) was my 40th birthday. I had a really nice celebration with my wife and kids at a picnic in the park. I really dont feel like I'm 40. My body thinks I am 50 - at least that how it seems to be acting. My mind still isnt used the the fact that I'm now more than just a little bit older than all those leading men and leading ladies on TV and movies. (Guess I can no longer identify them as part of my historical "baseline" :-)

Back again to describing The Principles of SCM! Last time I described The Baseline Reproducibility Principle. Now we'll take the next logical step and talk about the need to identify baselines.

If the ability to reproduce a baseline is fundamental to SCM, then it stands to reason that the ability to identify a baseline that I must be able to reproduce should also be pretty fundamental. If I have to be able to "show it", then I must first be able to "know it." If I can't uniquely identify a baseline, then it's pretty hard to reproduce it if I'm not sure what I'm trying to reproduce.

So the baseline reproducibility principle gives rise to The Baseline Identification Principle: a baseline must be identified by a unique name that can be used to derive all the constituent elements of the baseline. In other words, we have to have a name, and a way of associating that name with all the object (e.g. files) and their revisions that participate in the baseline.

How do we identify a baseline? By defining a name (or a naming system) to use, and using that name to reference the set of elements that were used to build/create the baselined version of the product.

A "label" or "tag" is one common way that a version control tool allows us to identify the sources of a baseline. This lets us associate a name with a specific set of repository elements and their corresponding revisions. Or it lets us associate a name with an existing configuration or event from which the set of elements and versions may be derived.

Sometimes tagging all the "essential" files and revisions in the repository is sufficient. Sometimes I need more information. I can always take any files or information that werent previously in the version control repository, and put them in the repository:
  • I can put additional information in a text file and checkin the file
  • I can export a database or binary object into some appropriate format (e.g., XML, or other formatted text)
  • some tools let me directly checkin a binary object (e.g, compilers, libraries, images, models) to the repository

If you currently have to label or tag more than just source-code and manually created text-files, then tell me about the other kinds of things you checkin and tag, and what special things you do to ensure they are identified as part of a baseline.

Monday, August 15, 2005

The Baseline Reproducibility Principle

Getting back to my earlier topic of The Principles of SCM, I think probably the first and most fundamental principle would be the requirement to be able to reproduce any baselined/released version of the software.

I'll call this The Baseline Reproducibility Principle: a baseline must be reproducible. We must be able to reproduce the "configuration" and content of all the elements that are necessary to reproduce a "released" version of the product.

By "released" I really mean "baselined" - it doesn't have to be a release to a customer. It could be a hand-off to any other stakeholder outside of development (like a test group, or a CM group, or QA, etc.). There is some basic vocabulary we need, like the terms "baseline" and "configuration." Damon Poole has started a vocabulary/glossary for SCM. Damon defines configuration but doesn't yet define a baseline.

A baseline is really shorthand for a "baselined configuration." And a baselined configuration is basically "a configuration with an attitude!" The fact that it's been "baselined" makes it special, and more important than other configurations that aren't baselined. We baseline a configuration when we need to promote/release it to another team/organization. By "baselining" it, we are saying it has achieved some consensually agreed upon level of "blessedness" regarding what we said it would contain and do, and what it actually contains and does.

Why do we need to be able to reproduce a baselined version of the product we produce and deliver? For several reasons:

  • Sometimes we want to be able to reproduce a reported problem. It helps to be able to reproduce the exact versions of the source code that made up version of the product that the customer is using.

  • In general, when we hand-off a version of the product to anyone that may report problems or request enhancements, it is useful to be able to reproduce the versions of the files that make-up that version of the system to verify or confirm their observations and expectations.

  • When a "fix" is needed, customers are not always ready/willing to deploy our latest version (containing new funcitonality plus the fix). Even if they are, sometimes our business is not - it wants to "give" them the fix, but make more money on any new functionality. So we must provide a "patch" to their existing version

  • When a baseline is a version of the product, it includes the specs and the executable software. Configuration auditing requires us to know the differences between the current product+specs versus their actual+planned functionality at the time that the product was released to them.
Those are just a few reasons. There are many more I'm sure.

What does it mean to reproduce a baseline? At the very least it means being able to reproduce the exact set of files/objects and their corresponding versions that were used to produce/generate the delivered version of the product. (That includes the specs that may be audited against, as well as the code).

Sometimes being able to reproduce the source files for the code+docs (and build scripts) is enough. Often we need to be able to do more than that. Sometimes it may be necessary to reproduce one or more of the following as well:

  • The version of the compilers/linkers or other tools used to create that version of the product

  • The version of any third-party libraries, code/interfaces/headers used to build the product

  • Any other "significant" aspect of the computing environment/network utilized during the creation of the delivered version of the product
It can be too easy to go to more effort than necessary to ensure reproducibility of more than is absolutely essential. What is essential to reproduce may depend upon many business and technical factors (including some possible contractual factors regarding deployment/upgrade, operational usage and support).

The ability to be able to reproduce a baseline is so basic to SCM; I can't believe it hasn't been a "named" principle before. I know others have certainly written about it as a principle, I'm just not recalling if any of them gave the principle a name.

I think names are powerful things. Part of what makes software patterns so powerful is that they give a name to an important and useful solution to a recurring problem in a particular context. The pattern name becomes an element of the vocabulary of subsequent discussion on the subject. So I can use the terms "Private Workspace" or "Task Branch" in an SCM-related conversation instead of having to describe what they are over and over again.

This is why I'd like to develop a set of named principles for SCM. I think lots of folks have documented SCM principles, but didn't give them names. And they might "stick" better if we gave them names. If you know of any examples of SCM principles that are already well known and have a name, please let me know! (Please include a reference or citation if possible)

Tuesday, August 09, 2005

SCM Design Smells

First, the news of the passing of Peter Jennings (ABC World News Tonight Anchor) became known to me early this morning. I'm very saddened by this. The world has lost a great mind and communicator, and Ive lost the trusted advisor I used to let into my home every evening since I was a teen to tell me about what was going on elsewhere in the world.

Getting back to my earlier topic of The Principles of SCM, I'd like to step through each of the Object-Oriented Design Principles mentioned in Robert Martin's book Agile Software Development: Principles, Patterns, and Practices and step through each principle, looking for how it applies to SCM.

Before I do that however, I'd first like to look at what "Uncle Bob" (as he is more affectionately called) refers to as design smells. These are as follows:
  • Fragility - Changes cause the system to break easily and require other changes.
  • Immobility - Difficult to disentangle entities that can be reused in other systems.
  • Viscosity - Doing things wrong/sloppy causes more friction and slows you down the next time you navigate through that code.
  • Needless Complexity - The System contains infrastructure that has no direct benefit (overengineering and/or "gold plating").
  • Needless Repetition - Repeated structures that should have a single abstraction (Redundancy).
  • Opacity - Code is hard to understand.
How might each of these apply to your SCM process and procedures? How might they apply to your branching & merging structure? Or to the organization of your source-tree?

Here's one possible "translation" of how these might apply to Software CM process and procedures:
  • Intolerant/Fragility - Changes cause the project, team, or organization to fall apart easily and require change to other parts of the project, team, or organization.
  • Rigidity/Immobility - Difficult to identify or disentangle practices and policies that can be reused by other projects, teams, or organizations.
  • Friction/Viscosity - Doing things wrong/sloppy causes more friction and slows you down the next time you navigate through that workflow or go on to the next one.
  • Wasteful/Needless Complexity - The Process contains "waste" in the form of extra steps, processing, handoff, waiting, or intermediate artifacts that do not "add value" for the customer, project, or organization.
  • Manual Tedium/Repetition - Repeated or tedious steps and activities should have a single mechanism to automate them.
  • Opacity - The project or process is hard to understand. (Lack of Transparency)

How would you translate design smells into process smells for Software CM?

Monday, August 01, 2005

The Customer Inversion Principle of Process Design

Lookingback on last week's blog-entry suggesting we should CM to an Interface, not an implementation, I wonder if that was really an instance of the stated design principle, or of something else ...

Often times, the process and procedures that development must follow in order to comply with CM needs were developed by the people who receive the outputs of development but who dont necessarily perform the development activities themselves. These process-area experts are the process designers and the developers are the end-users of their process.

The conclusion of CM to an interface, not an implementation was to essentially invert or "flip" the relationship between who is the process "producer" and who is its "customer." The Principles of Lean Thinking suggest that processes should be designed by the practitioners who are most intimately familiar with performing the activities and their reasons for being a necessary step in the process: Those who receive the outputs of that process are its customers, and they get to specify the requirements, but not the implementation.

If true, this could perhaps be a statement of a different principle that we might call The Customer-Inversion Principle of Process Design:
  • Upstream Development procedures should not depend on downstream CM procedures, both should depend upon the abstract interfaces represented by development's exit criteria and CM's entry criteria.
  • Procedures should not be designed for their practitioners by the upstream customer of their results, Practitioners should design their own procedures to meet the requirements of their upstream customers.
Not only does this "inversion" of the process producer/customer relationship conform with the design principle to separate interface from implementation, and with principles of lean thinking, it also aligns with Agile principles of putting the customer in charge and preferring customer collaboration over contract negotiation, when "negotiating" the right balance between the process requirements and the procedural implementation.

It also somewhat "inverts" (or at least turns on its head) what might be the more stereotypical perception by many agilists of CM as "controlling opponents" into one of "collaborating customers", and hopefully helps lend some a new perspective about how to successfully pair with other organizational stakeholders making additional demands upon the use of more formal standards, documentation, and tools upon an agile project. (See my earlier blog-entry on Building Organizational Trust by Trusting the Organization.)

Surely there must be some exceptions. What about when development has absolutely no CM knowledge or appreciation whatsoever? Should a knowledgeable CM person define development's CM activities for them?

To me this sounds similar to the situation of an expert needing to play the role of coach for a more junior engineer. A more directive or coaching style of leadership may be required, where CM doesnt necessarily give all the answers, but still plays a strong collaborative role in specifying not only their requirements, but in educating development about existing SCM patterns and their applicability and context, and helping them choose the most appropriate patterns and tradeoffs to design the CM procedures that development should use.

If development is not yet able to understand and/or is willing to be initially "told" what to do - then "telling"/directing (instead of coaching) might be the first step. But ultimately I believe practitioners of a process need to feel a sense of ownership over their own process and procedures if they are to continue being effective. By helping them understand the process requirements, and the applicable patterns and principles, we help them become better developers, and better advocates of effective CM. At least that's been my experience.

What do you think? Does it sound nice in theory but not work "in practice" in your own experience?