summarizing build-on-checkin feedback
Posted by rhelmer on 09 Jan 2008 at 11:08 pm | Tagged as: buildbot, mozilla, tinderbox
Lots of feedback on the build-on-checkin idea in my blog, the newsgroup, and especially joduinn’s recent post on the subject. The primary concerns seem to be:
- we need as many performance tests per checkin as possible
I’ve filed bug 410869 to track this. I think the way we do this now is wrong, and we’d get more performance cycles if we fixed this by separating the start time of the test from the revision that the test is for. Also, we should do a separate perf test for each checkin, not just the latest when the perf machine becomes available, to be able to track down regressions to a specific changeset.
- sometimes the build breaks for non-checkin reasons, and someone needs to be hunted down to correct it if it’s build-on-checkin
I think this is mainly a fault of not having adequate monitoring, auto-recovery, and load-balancing of the server farm, and not giving the right people access to force builds directly. bhearsum is rocking the monitoring side in bug 410019 so we’ll know as soon as anything goes wrong at the machine level, and Buildbot can do the load-balancing and give developers an interface to force/clobber/stop builds as needed, without having to give everyone in the project a shell account or wait til the next checkin to pick up a CLOBBER file.
- some people will still be stuck waiting for build cycles, this just moves the problem around
I think this is absolutely a valid concern, and the more I think about it, build-on-checkin isn’t really all that valuable until we have multiple buildslaves able to run in parallel, so no one has to wait for the current cycle to finish in order to have their checkin tested. bug 411629 has been filed to track this.
- CVS commits are not atomic, what if we pull a partial checkin?
Fortunately this goes away when we switch to hg for Moz2, but even for 1.8 and 1.9 branches we poll Bonsai (and can use the revision, aka branch+timestamp) that it contains, instead of just blindly pulling CVS. I don’t *think* that Bonsai is susceptible to this kind of thing due to the way it groups checkins before reporting them, but please correct me if this is wrong. Also, isn’t this a problem today, since Tinderbox client just blindly picks a timestamp and pulls it?
If I’ve missed or misrepresented anything, please let me know, and check out the dependency tree on bug 401936 for more information.
I suspect getting a datestamp from bonsai might help (but maybe not). bonsai updating gets started by mail sent from dolog.pl, which happens individually for each file committed. The bonsai mail processor runs as a cron job, but the cron job just invokes addcheckin.pl for each mail received (again, 1 per file). So there’s no atomization by bonsai, but the mechanics might mean you’d be more likely to get lucky.
Practically speaking, if you have a big checkin (taking a long time), I don’t really see how you could avoid mid-checkin bustage for any client that was idle.
Andrew, that explanation helps, thanks. I was hoping that Bonsai provided some kind of grace period because it seems to group all the files in a commit as one email.
Buildbot can do a grace period after a checkin is detected, if we set this high enough then it should be ok… determining what “high enough” is is the trick.
And again, moving to a version control system that does atomic commits is the right thing to do anyway, so if we need to block on that we can, but it would be nicer if we do not have to leave the branches so far behind.
OK, I’m going to make a liar out of myself.
looking more closely at dolog.pl, bonsai doesn’t atomize it, but CVS does. dolog.pl receives a list of the files that were modified as part of a commit. So that would certainly help.
But, I guess there would still be the possibility that something was committed to CVS and not-yet processed by bonsai and that checkin could get partially pulled.
commit 1 at 3:15
bonsai starts
commit 2 starts at 3:15, ends at 3:16
client checks bonsai and uses 3:15 as the time to use as a datestamp ==> bustage. So, it’s still possible, but not so likely (and perhaps less likely than pulling at non-bonsai times).
So what CVS does, it sends a mail per checked-in directory. What bonsai does is taking those mails and coalescence them into single check-ins.
What bonsai poller does is query bonsai until it get’s a check-in, it doesn’t wait ’til that check-in yields a stable result. Which is a bug that I didn’t file.
The good news is, that only matters if you want to find a pull time between two check-ins, or if you want to use the bonsai data for some fast-update path. Which would be cool. So I guess once upon a time, I should file that bug on bonsai poller.
Yeah, using bonsai means that you probably won’t have problems with mid-checkin of a single checkin… it would only have the normal mid-checkin problems if a second checkin were taking place at roughly the same time… but if you check out by date, even that shouldn’t be an issue, as long as the date is precise enough to slice the two checkins.
We’ve been talking about this a bit this week and we’re starting to look at filing the performance runs on talos with the timestamp of the checkin. We’re waiting on some graphserver fixes that should allow us to include multiple runs for the same buildid at the same time, which’ll allow better correlation between source code and perf run.
As it stands now, we’re only testing what the build farm produces and I don’t see this changing any time soon. If people require patch-level analysis, I think the better route to go will be the talos try server bug 398192 which I’m going to continue looking at in just a few minutes…
> Also, isn’t this a problem today, since Tinderbox client just blindly
> picks a timestamp and pulls it?
Yes, this _is_ a problem today. Mid-checkin build-bustage happens all the time which is why the idea of starting all the tinderboxen at the same time alarms people. Today one or two boxes go red, but other builds starting slightly later prove it was just mid-checkin wonkiness and checkins can continue. If a checkin triggered all the boxes at the same time everyone would have to wait two full cycles (of the fastest box) before there was confirmation that it was a mid-checkin bustage
[...] initial blog on this topic attracted a bunch of comments, and some great blogs on this, especially RobHelmer’s blog and Rob Campbell’s [...]