Thursday, May 21, 2009

Synchronous and Staged Integration

I participated in a LinkedIn CM group discussion about Building Code before -vs- after Checkin. The discussion was kicked-off by Tracy Ragan, COO and Co-Founder, OpenMake Software:

Many companies implementing a distributed SCM process make the mistake of checking source code into their SCM repository before they validate the code through a compile and link process. Checking in source code that does not compile is honestly, a waste of time. I call it the garbage in/garbage out method. The goal of SCM is to match your production source code to your production executables. This goal should be kept in mind when implementing your SCM process.

So many companies have a very complex SCM process with tightly managed approvals. But when it comes time to roll out binaries to production, they have no idea how those binaries were created. What you need is the ability to run a footprint of your production executables showing all artifacts used to create those binaries. That footprint should show the versions of the files that were found via your SCM repositories and audit all files that were used to create the binary but were not stored in your SCM repository.

Build your code as part of your SCM process. This is the only way to know if the code you are spending time and money to manage is actually executing in your production environment. The mainframe community has gotten this right for the last 20 years. It is time for the distributed developers to sort out a 100% complete SCM process.

There were several good comments, most of them tacking positions for or against, and a few adding some more insight. I responded as follows ...

I wrote a paper for the CM Journal on this very issue a few years back (Nov 2003). It was entitled Codeline Merging and Locking: Continuous Update and Two-Phased Commits

It talks about what we ideally want to have done by the time we try to commit our changes to the codeline (shared/team integration branch) and some of the different strategies (patterns) and trade-offs for how to ensure correct, complete & consistent results while still trying to be as practical as possible regarding complexity and overhead.

It does not however discuss the issue of "synchronous" versus "asynchronous" build+regression-test as part of the commit operation. It assumes "synchronous", where you must successfully build+test *before* the commit operation is considered complete (which is what Tracy is talking about here).

Another approach is "asynchronous", which is what many CI-server implementations do: allow the commit complete (perhaps after doing only an incremental build), but then behind the scenes immediately "kickoff" a more rigorous/complete build which then raises a visible alert upon failure (which should then be fixed *immediately*).

Rather than an either/or approach (building before commit -vs- building after commit), what is becoming more common for larger projects & codebases is a "staged continuous integration approach" such as those described by the following:

No comments: