Saturday, September 24, 2005

Quantum Agility and Organizational Gravity

Just a random synapse firing in my brain ... I remember back in my high school days being enthralled with physics and the latest grand-unified theories (GUTS), and how gravity was always the "odd ball" in trying to unify the four fundamental forces of nature into a single, simple, consistent and coherent theory:
  • Quantum mechanics could unify all but gravity. It was great, and incredibly accurate at explaining all the rich and myriad interactions of things at the molecular, atomic and subatomic levels.

  • But throw in celestial bodies and large distances, and the thing called "gravity" rears its ugly head and makes things complicated. In theory it's nowhere near as strong as the other forces, and yet any time you had to scale up to things large enough and far enough away to need a telescope instead of a microscope, it made everything fall apart.
Sometimes I think Agile "theory" and large projects and organizations are the same dichotomy.
  • The "Agile" stuff seems great in small teams and projects that can be highly collaborative and iterative over short (collocated) distances with small "lightweight" teams and processes.

  • But throw it into a large project or organization, and "gravity" sets in, adding weight and mass and friction to processes and communication, and yet necessarily so, in order to scale to a larger living system of systems of systems.
So we are left with quantum agility and organizational gravity and trying to reconcile the two. What's an Agile SCMer to do about all that?

Saturday, September 17, 2005

Can I have just one repository please?

One of the things I spend a lot of time dealing with is integration between application lifecycle management tools and their corresponding process areas: requirement management, configuration management, test management, document management, content management, change management, defect management, etc.

So I deal with process+tool architecture integration for a user community of several thousand, and the requirements, version control, change-tracking, and test management tools almost always each have their own separate repositories. Occasionally the change-tracking and version-control are integrated, but the other two are still separate.

And then if there is a design modeling tool, it too often tries to be a "world unto itself" by being not merely a modeling environment but attempting to store each model or set of models as a "version archive" with its own checkin/checkout, which makes it that much more of a pain in the you-know-what to get it versioned and labeled/baselined together with the code, particularly if code-generation is involved and needs to be part of the build process.

And what really gets to me is that, other than the version control tool, the other tools for requirements and test management, and typically change management usually have little or no capability to deal with branching (much less merging). So heaven forbid one has to support multiple concurrent versions of more than just the code if you use one of the other tools.

The amount of additional effort for tool customization and configuration and synchronization and administration to make these other tools be able to deal with what is such a basic fundamental version-control capability is enormous (not to mention issues of architectural platforms and application server farms for a large user base). So much so that it makes me wonder sometimes if the benefit gained by using all these separate tools is worth the extra integration effort. What if I simply managed them all as text files in the version control system?

At least then I get my easy branching and merging back. Plus I can give them structure with XML (and then some), and could easily use something like Eclipse to create a nice convenient GUI for manipulating their contents in a palatable fashion.

And all the data and metadata would be in the same database (or at least one single "virtual" database). No more having to sync with logically related but physically disparate data in foreign repositories and dealing with platform integration issues, just one big (possibly virtual) repository for all my requirements, designs, code, tests, even change-requests, without all the performance overhead and data redundancy and synchronization issues.

It could all be plain structured text with XML and Eclipse letting each artifact-type retain its own "personality" without having to be a separate tool in order to do it.

Why can't someone make that tool? What is so blasted difficult about it!!!

I think the reason we dont have it is because we are use to disconnected development as "the rule" rather than as the exception. Companies that shell out the big bucks for all of those different tools usually have separate departments of people for each of requirements (systems/requirements engineers), design (software architects), source-code ("programmers"), test (testers), and change-management.

It's a waterfall-based way of organizing large projects and it seems to be the norm. So we make separate tools for each "discipline" to help each stay separate and disconnected, and those of us doing EA/EAI or full lifecycle management of software products have to deal with all the mess of wires and plumbing of integration and platforms and workflow.

Oh how I wish I could take a combination of tools:
  • a good, stream-based version control tool like Accu-Rev
  • a fully Java/XML extensible issue-tracker like Jira (or combination of the two, like SpectrumSCM)
  • a framework like Eclipse
  • and a collaborative knowledge/content management system like Confluence
and roll them together into a single integrated system with a single integrated repository.

Notice I didn't mention any specific tools for requirements-management or test-management. Not that I dont like any of the ones available, I do, but I think it's time for a change in how we do those things with such tools:
    they basically allow storing structured data, often in a hierarchical fashion with traceability linkages, and a way of viewing and manipulating the objects as a structured collection, while being able to attach all sorts of metadata, event-triggers, and queries/reports
I think a great wiki + CMS like Confluence and Jira can do all that if integrated together; Just add another "skin" or two to give a view of requirements and tests both individually and as collections (both annotated and plain).

The same database/repository could give me both an individual and hierarchical collection-based views of my requirements, designs, code, tests and all their various "linkages." Plus linking things in the same database is a whole lot easier to automate, especially thru the same basic IDE framework like Eclipse.
  • the requirements "skin" gives me a structured view of the requirements, and collaborative editing of individual requirements and structured collections of them;
  • ditto for the test "skin";
  • and almost "ditto" for the "change-management" skin (but with admittedly more workflow involved)
  • the design tool gives me a logical (e.g., UML-based) view of the architecture
  • the IDE gives me a file/code/build-based view of my architecture
  • And once MS-Office comes out with the standard XML-based versions, then maybe it will be pretty trivial to do for documents too (and to integrate XML-based Word/Office/PPT "documents" with structured requirements and tests in a database)
Oh why oh why can't I have a tool like that! Pretty please can I have it?

Sunday, September 11, 2005

Change-Packaging Principles

In my previous blog-entry I tried translating Uncle Bob's OOD Principles of package cohesion into the version-control domain by substituting "release" with "promote" or "commit", and "reuse" with "test".

I think that didn't work too well. I still think "promotion" corresponds to "release", but "reuse" corresponds to something else. I'm going to try translating "reuse" to "integration". If I integrate (e.g., merge) someone else's changes into my workspace, I am quite literally reusing their work. If I commit my own change to the codeline, then I am submitting my work for reuse by the rest of the team that is using the codeline (particularly the "tip" of the codeline) as the basis of their subsequent changes.

So if I equate "release" with "promotion", and "reuse" with "integration" I think the result is the following:
  • The Promotion-Integration Equivalency Principle -- The granule of integration is the granule of promotion. (So it's not just the change content, but also the context – the entire configuration – that we end up committing to the codeline/workstream.)

  • The Change Closure Principle -- Elements that must be changed together are promoted together (implies task-level commit).

  • The Change Promotion Principle -- Elements that must be integrated together are promoted together (implies doing workspace update prior to task-level commit)
These "work" for me much better than the previous translation attempt. Note that the "change closure principle" didn't change much from before - it was just clarified a bit to indicate the dependency between elements.

This also makes me think I've stumbled onto the proper meaning for translating the Interface Segregation Principle (ISP): ISP states "Make fine-grained interfaces that are client-specific." If "integration" is reuse, then each atom/granule of change is an interface or "container" of the smallest possible unit of reuse.

The smallest possible unit of logical change that I can "commit" that doesn't break the build/codeline would be a very specific, individually testable, piece of behavior. Granted, sometimes it might not be run-time behavior ... it could be build-time behavior, or behavior exhibited at some other binding time.

This would yield the following translation of the ISP into the version-control domain:
    The Change Separation Principle -- Make fine-grained incremental changes that are behavior-specific. (i.e., partition your task into separately verifiable/testable yet minimal increments of behavior.)
I'm not thrilled about the name (please feel free to suggest a better one -- for example ... how about "segmentation" instead of "separation"?) but I think the above translation "works" quite well, and also speaks to "right-sizing" the amount of change that is committed to the codeline as an individual "transaction" of change. The way it's worded seems like it's talking exclusively about "code", but I think it really applies to more than just code, so long as we arent constraining ourselves to execution-time "behavior."

Let me know what you think about these 4 additions to the family of SCM principles!

Saturday, September 03, 2005

The Blog ate my homework!

[NOTE: due to some comment-spam on my last entry (which I have since deleted), I haved turned on "word verification" for comments.]

When I was composing my previous blog entry, something very frustrating happened: The blog ate my homework!

I frequently save intermediate drafts of my blog entries before I publish them. I had been working on my most recent draft for a couple hours. I'd been finalizing many of the sentences and paragraphs, making sure the flowed, checking the word usage, spellchecking, adding and verifying links, and then ... when I was finally ready to publish, I hit the publish button on the blogger compose window, and it asked me to login again. When I did, my final edits were GONE! I'd just lost two hours worth of work.

My first thought was ARRRRRRRGGGGGHHHHH! My next thought was "no freakin' WAY did that just happen to ME!" Then much profanity ensued (at least in my own silent frustration) and I tried my darndest to look thru any and all temp files and saved files on my system and on blogger.com, all for naught. I had indeed fallen victim to one of the most basic things that CM is supposed to help me prevent. How infuriating! How frustrating! How embarrassing. I was most upset not about the lost text, but about the lost time!

I figure there must be a lesson in there somewhere to pass along. Ostensibly, the most obvious lesson would be to use the Private Versions pattern as outline in my book. The thing is ... I had been doing just that! It was in the very act of saving my in-progress draft (before publishing it) that my changes were lost.

What I could (and possibly should) have done instead was not use blogger's composer to compose my drafts. I could have done it locally instead, on my own machine (and my own spellchecker). And perhaps I will do that a bit more from now on. Still, it's pretty convenient to compose it with blogger because"
  • I get rapid feedback as to what it will actually look like, and ...
  • I can access it from any machine (not just the one I use late at night)
I later realized why it happened. I was trying to do two things at once:
  • In one window I was composing my blog entry.
  • In another browser window I was visit webpages I wanted to hyperlink to from my entry and verifying the link.
Okay - so there's nothing wrong with that. I mean I was doing two things at the same time, but I wasn't really trying to multi-task because I was still trying to work on my blog-entry.

The real culprit wasnt that I had two windows open at the same time, it was that one of the webpages I wanted to hyperlink to was also a blogger.com hosted blog-entry. And since I was positing a question in my entry that referred to this one, I also wanted to create a comment in the referred-to entry that asked the question and referenced back to my own blog.

Posting that comment caused me to have enter my blogger id and passwrod, and that essentially forced a new login - which made it look like my current login (where I was composing my entry) either ended, or had something unusual going on that warranted blogger wanting me to re-authenticate myself. And when it did, I lost my changes! OUCH!

Actually, I hadnt even posted the comment - I had only previewed it (saving it as a draft). Anyway - I was too upset (and it was too late at night) to try and recreate my change sthen. So I waited another day before doing it. I have to say Im not as happy with the result. I had really painstakingly satisfied myself with my wording and phrasing before I lost my changes. I wasn't as thorough the second time around because I wanted to be done with it!

So what was my big mistake? I was using private versions, and I wasn't trying to multi-task. I was in some sense trying to simultaneously perform "commits" of two different things at the same time, but they were to different "sections" of the same repository, so that really shouldn't have been such a terrible thing.

My big mistake wasn't so much a lack of good CM as it was a lack of good "agility": I let too much time lapse in between saving my drafts. I wasn't working in small enough batch-sizes (increments/iterations)!

Granted, I don't want to interrupt my flow of thought mid-sentence or mid-paragraph to do a commit. But certainly every time I was about to visit and verify another hyperlink in my other browser window, I should have at least saved my current draft before doing so. And I probably should have made sure I did so at least every 15-20 minutes. (You can be darn sure that's what I did this time around :-)

This sort of relates to how frequently someone should commit their changes in a version control system. Some of the SCM principles that I havent described yet will relate to this. Uncle Bob's Principles of Object-Oriented Design have a subset that are about "package cohesion" and granularity
  • REP: The Release Reuse Equivalency Principle -- The granule of reuse is the granule of release.

  • CCP: The Common Closure Principle -- Classes that change together are packaged together.

  • CRP: The Common Reuse Principle -- Classes that are used together are packaged together.
In the context of version control, these "packages of classes" would probably correspond to "packages of changes" that make up a single logical "change transaction" or "commit" operation. If that is a valid analogy, then I need to decide what "reuse" and "release" mean in this context:
  • I think "release" would mean to "promote" or "commit" my changes so they are visible to others using the same codeline.

  • I think "reuse" would mean ... hmmn that's a tough one! It could be many things. I think that if a change is to be reusable, it must be testable/tested. Other things come to mind too, but that's the first one that sticks.
So let's see what happens if I equate "release" with "commit", equate "reuse" with "test" and see if the result is coherent and valid. This would give me the following:
  • The Commit/Test Equivalency Principle -- The granule of test is the granule of commit.

  • The Change Closure Principle -- Files that change together are committed together.

  • The Test Closure Principle -- Files that are tested together are committed together (including the tests).
Comments? Thoughts? What do these mean to you? Does it mean anything more than using a task-level commit rather than individual file checkin? Should these always "hold true" in your experience? When shouldnt they? (and why?)

Oh - and feel free to suggest better names if you dont like the ones I used. I'm not going to supply abbreviations for these because, or name any blog-entries after them just yet because I'm not yet certain if they are even valid.