Monday, August 15, 2005

The Baseline Reproducibility Principle

Getting back to my earlier topic of The Principles of SCM, I think probably the first and most fundamental principle would be the requirement to be able to reproduce any baselined/released version of the software.

I'll call this The Baseline Reproducibility Principle: a baseline must be reproducible. We must be able to reproduce the "configuration" and content of all the elements that are necessary to reproduce a "released" version of the product.

By "released" I really mean "baselined" - it doesn't have to be a release to a customer. It could be a hand-off to any other stakeholder outside of development (like a test group, or a CM group, or QA, etc.). There is some basic vocabulary we need, like the terms "baseline" and "configuration." Damon Poole has started a vocabulary/glossary for SCM. Damon defines configuration but doesn't yet define a baseline.

A baseline is really shorthand for a "baselined configuration." And a baselined configuration is basically "a configuration with an attitude!" The fact that it's been "baselined" makes it special, and more important than other configurations that aren't baselined. We baseline a configuration when we need to promote/release it to another team/organization. By "baselining" it, we are saying it has achieved some consensually agreed upon level of "blessedness" regarding what we said it would contain and do, and what it actually contains and does.

Why do we need to be able to reproduce a baselined version of the product we produce and deliver? For several reasons:

  • Sometimes we want to be able to reproduce a reported problem. It helps to be able to reproduce the exact versions of the source code that made up version of the product that the customer is using.

  • In general, when we hand-off a version of the product to anyone that may report problems or request enhancements, it is useful to be able to reproduce the versions of the files that make-up that version of the system to verify or confirm their observations and expectations.

  • When a "fix" is needed, customers are not always ready/willing to deploy our latest version (containing new funcitonality plus the fix). Even if they are, sometimes our business is not - it wants to "give" them the fix, but make more money on any new functionality. So we must provide a "patch" to their existing version

  • When a baseline is a version of the product, it includes the specs and the executable software. Configuration auditing requires us to know the differences between the current product+specs versus their actual+planned functionality at the time that the product was released to them.
Those are just a few reasons. There are many more I'm sure.

What does it mean to reproduce a baseline? At the very least it means being able to reproduce the exact set of files/objects and their corresponding versions that were used to produce/generate the delivered version of the product. (That includes the specs that may be audited against, as well as the code).

Sometimes being able to reproduce the source files for the code+docs (and build scripts) is enough. Often we need to be able to do more than that. Sometimes it may be necessary to reproduce one or more of the following as well:

  • The version of the compilers/linkers or other tools used to create that version of the product

  • The version of any third-party libraries, code/interfaces/headers used to build the product

  • Any other "significant" aspect of the computing environment/network utilized during the creation of the delivered version of the product
It can be too easy to go to more effort than necessary to ensure reproducibility of more than is absolutely essential. What is essential to reproduce may depend upon many business and technical factors (including some possible contractual factors regarding deployment/upgrade, operational usage and support).

The ability to be able to reproduce a baseline is so basic to SCM; I can't believe it hasn't been a "named" principle before. I know others have certainly written about it as a principle, I'm just not recalling if any of them gave the principle a name.

I think names are powerful things. Part of what makes software patterns so powerful is that they give a name to an important and useful solution to a recurring problem in a particular context. The pattern name becomes an element of the vocabulary of subsequent discussion on the subject. So I can use the terms "Private Workspace" or "Task Branch" in an SCM-related conversation instead of having to describe what they are over and over again.

This is why I'd like to develop a set of named principles for SCM. I think lots of folks have documented SCM principles, but didn't give them names. And they might "stick" better if we gave them names. If you know of any examples of SCM principles that are already well known and have a name, please let me know! (Please include a reference or citation if possible)

No comments: