Sunday, August 21, 2005

The Baseline Identification Principle

Yesterday (actually just a few hours ago) was my 40th birthday. I had a really nice celebration with my wife and kids at a picnic in the park. I really dont feel like I'm 40. My body thinks I am 50 - at least that how it seems to be acting. My mind still isnt used the the fact that I'm now more than just a little bit older than all those leading men and leading ladies on TV and movies. (Guess I can no longer identify them as part of my historical "baseline" :-)

Back again to describing The Principles of SCM! Last time I described The Baseline Reproducibility Principle. Now we'll take the next logical step and talk about the need to identify baselines.

If the ability to reproduce a baseline is fundamental to SCM, then it stands to reason that the ability to identify a baseline that I must be able to reproduce should also be pretty fundamental. If I have to be able to "show it", then I must first be able to "know it." If I can't uniquely identify a baseline, then it's pretty hard to reproduce it if I'm not sure what I'm trying to reproduce.

So the baseline reproducibility principle gives rise to The Baseline Identification Principle: a baseline must be identified by a unique name that can be used to derive all the constituent elements of the baseline. In other words, we have to have a name, and a way of associating that name with all the object (e.g. files) and their revisions that participate in the baseline.

How do we identify a baseline? By defining a name (or a naming system) to use, and using that name to reference the set of elements that were used to build/create the baselined version of the product.

A "label" or "tag" is one common way that a version control tool allows us to identify the sources of a baseline. This lets us associate a name with a specific set of repository elements and their corresponding revisions. Or it lets us associate a name with an existing configuration or event from which the set of elements and versions may be derived.

Sometimes tagging all the "essential" files and revisions in the repository is sufficient. Sometimes I need more information. I can always take any files or information that werent previously in the version control repository, and put them in the repository:
  • I can put additional information in a text file and checkin the file
  • I can export a database or binary object into some appropriate format (e.g., XML, or other formatted text)
  • some tools let me directly checkin a binary object (e.g, compilers, libraries, images, models) to the repository

If you currently have to label or tag more than just source-code and manually created text-files, then tell me about the other kinds of things you checkin and tag, and what special things you do to ensure they are identified as part of a baseline.

3 comments:

Ken MacLeod said...

We've been baselining release artifacts in two different situations.

When creating Linux packages (RPMs, .deb), we checkin the package. In this case, however, we don't create the baseline until after we've created the package and checked it in, so both are in the same initial baseline. In this case, it's the same developer creating the package and the baseline.

When creating flash ROM image release, we checkin the image. This is a more formal process. The first thing we do is add a baseline "release" label (eg. 3.14) to an existing incremental build label. When the build is complete, we check in the built image as well and then create a new baseline (eg. 3.14-bin).

Brad Appleton said...

Thanks for real-world example Ken! Can you say more about the difference between a package and a flash ROM image? Are they both releases of the same thing and the same set of files/components?

Ken MacLeod said...

Each package is a separate component. The practice is adopted directly from open source, so they generally have exactly the same scope as open source projects. It also provides us the ability to intermix Linux packages from vendors with packages developed internally, as well as open source projects where we maintain local changes. Packages are generally delivered internally as they are updated, each compiled for separate architectures or targets as necessary.

A flash ROM, or in some cases a CD-image, will be a consumer of many packages. A flash ROM will typically contain a "cherry-picked" (for space) subset of files from a few dozen packages. As a consumer, the baseline of a flash ROM is effectively a baseline across the packages used to create it. Different target flash images will share many common-architecture packages and not share other packages that provide specific capability for the target device.