I am in need of some help with philosophy and design of a continuous integration setup.
Our current CI setup uses buildbot. When I started out designing it, I inherited (well, not strictly, as I was involved in its design a year earlier) a bespoke CI builder that was tailored to run the entire build at once, overnight. After a while, we decided that this was insufficient, and started exploring different CI frameworks, eventually choosing buildbot. One of my goals in transitioning to buildbot (besides getting to enjoy all the whiz-bang extras) was to overcome some of the inadequacies of our bespoke nightly builder.
Humor me for a moment, and let me explain what I have inherited. The codebase for my company is almost 150 unique c++ Windows applications, each of which has dependencies on one or more of a dozen internal libraries (and many on 3rd party libraries as well). Some of these libraries are interdependent, and have depending applications that (while they have nothing to do with each other) have to be built with the same build of that library. Half of these applications and libraries are considered "legacy" and unportable, and must be built with several distinct configurations of the IBM compiler (for which I have written unique subclasses of Compile), and the other half are built with visual studio. The code for each compiler is stored in two separate Visual SourceSafe repositories (which I am simply handling using a bunch of ShellCommands, as there is no support for VSS).
Our original nightly builder simply took down the source for everything, and built stuff in a certain order. There was no way to build only a single application, or pick a revision, or to group things. It would launched virtual machines to build a number of the applications. It wasn't very robust, it wasn't distributable. It wasn't terribly extensible. I wanted to be able to overcame all of these limitations in buildbot.
The way I did this originally was to create entries for each of the applications we wanted to build (all 150ish of them), then create triggered schedulers that could build various applications as groups, and then subsume those groups under an overall nightly build scheduler. These could run on dedicated slaves (no more virtual machine chicanery), and if I wanted I could simply add new slaves. Now, if we want to do a full build out of schedule, it's one click, but we can also build just one application should we so desire.
There are four weaknesses of this approach, however. One is our source tree's complex web of dependencies. In order to simplify config maintenace, all builders are generated from a large dictionary. The dependencies are retrieved and built in a not-terribly robust fashion (namely, keying off of certain things in my build-target dictionary). The second is that each build has between 15 and 21 build steps, which is hard to browse and look at in the web interface, and since there are around 150 columns, takes forever to load (think from 30 seconds to multiple minutes). Thirdly, we no longer have autodiscovery of build targets (although, as much as one of my coworkers harps on me about this, I don't see what it got us in the first place). Finally, aformentioned coworker likes to constantly bring up the fact that we can no longer perform a full build on our local machine (though I never saw what that got us, either, considering that it took three times as long as the distributed build; I think he is just paranoically phobic of ever breaking the build).
Now, moving to new development, we are starting to use g++ and subversion (not porting the old repository, mind you - just for the new stuff). Also, we are starting to do more unit testing ("more" might give the wrong picture... it's more like any), and integration testing (using python). I'm having a hard time figuring out how to fit these into my existing configuration.
So, where have I gone wrong philosophically here? How can I best proceed forward (with buildbot - it's the only piece of the puzzle I have license to work on) so that my configuration is actually maintainable? How do I address some of my design's weaknesses? What really works in terms of CI strategies for large, (possibly over-)complex codebases?