January 2008

Monthly Archive

volunteering

Posted by rhelmer on 21 Jan 2008 | Tagged as: mozilla

Thought-provoking post over on isabel wang’s blog:

When you ask for X hours of someone’s time to help put up pre-made signs or read off telemarketing scripts, each volunteer means no more to you than just another undifferentiated source of labor. No one is put to their highest and best use.

Regardless of what you think of the presidential candidates and the general state of American politics, I think that focusing on what the volunteer is bringing to the table just makes sense when your organization depends on volunteers.

A point that a lot of companies miss about people who volunteer their time and energy towards open-source projects like Mozilla is that volunteers are not coming to finish your to-do list for you, they are trying to make their world a better place by using you as the vehicle. They may not all do what you want or represent you the way that you’d like, but hopefully if everyone is coming from the same core set of ideals then everyone ends up richer for it.

tinderboxJsonApi 0.1

Posted by rhelmer on 17 Jan 2008 | Tagged as: mozilla, tinderbox

Many people have told me that they were excited about the JSON Tinderbox feed, but were quickly discouraged from doing anything fun due to the scary data structure that it presents; it’s a straight dump of what the server uses, and is obviously optimized towards making a waterfall display (plus, it’s just plain weird).

I set up an enhanced waterfall as an example a while back, but it’s really hard to take it further without spending a lot of time digging around inside the tinderbox_data object.

I’ve often wished that I could just sort by column in Tinderbox, so instead of doing yet-another one-off script I put together a little web app that gives you a sortable table of the latest (non-talos) perf data: Analysis paralysis

Click on the headers, and you get data sorted by your criteria. The data is real-time, but does not auto-reload.

I started to hit a wall almost immediately due to the machinations required for the tinderbox_data structure, so I stepped back and took some time to write a tboxJsonApi.js instead of dealing directly with the data from Tinderbox. This lets me write code like:

<script src="http://tinderbox.mozilla.org/Firefox/json.js">
<script>
tree = new Tree(tinderbox_data);
builds = tree.getBuilds();

for (i in builds) {
  build = builds[i];
  build.getName();
  build.getStartTime();
  build.getStatus();
</script>

You can get checkins for a particular build, or test results (the scrape data is processed, right now it only supports anchor tags with “key: value” format link text, which is why Talos isn’t yet supported).

There’s a bunch more stuff I want to do before this will be generally useful to me, e.g. CSV export, merging all build, perf and test data for a checkin into one row, etc. but I think it’s obvious that we could have more useful tools for tracking and analyzing the absolute mountian of data that mozilla.org produces every day.

Let me know if you find this useful, and/or have any questions or ideas for improvements. I was able to throw this all together in a few hours this evening, because I spent so much less time wrestling with data structures and more modeling the kind of app I wanted.

summarizing build-on-checkin feedback

Posted by rhelmer on 09 Jan 2008 | Tagged as: buildbot, mozilla, tinderbox

Lots of feedback on the build-on-checkin idea in my blog, the newsgroup, and especially joduinn’s recent post on the subject. The primary concerns seem to be:

  • we need as many performance tests per checkin as possible

I’ve filed bug 410869 to track this. I think the way we do this now is wrong, and we’d get more performance cycles if we fixed this by separating the start time of the test from the revision that the test is for. Also, we should do a separate perf test for each checkin, not just the latest when the perf machine becomes available, to be able to track down regressions to a specific changeset.

  • sometimes the build breaks for non-checkin reasons, and someone needs to be hunted down to correct it if it’s build-on-checkin

I think this is mainly a fault of not having adequate monitoring, auto-recovery, and load-balancing of the server farm, and not giving the right people access to force builds directly. bhearsum is rocking the monitoring side in bug 410019 so we’ll know as soon as anything goes wrong at the machine level, and Buildbot can do the load-balancing and give developers an interface to force/clobber/stop builds as needed, without having to give everyone in the project a shell account or wait til the next checkin to pick up a CLOBBER file.

  • some people will still be stuck waiting for build cycles, this just moves the problem around

I think this is absolutely a valid concern, and the more I think about it, build-on-checkin isn’t really all that valuable until we have multiple buildslaves able to run in parallel, so no one has to wait for the current cycle to finish in order to have their checkin tested. bug 411629 has been filed to track this.

  • CVS commits are not atomic, what if we pull a partial checkin?

Fortunately this goes away when we switch to hg for Moz2, but even for 1.8 and 1.9 branches we poll Bonsai (and can use the revision, aka branch+timestamp) that it contains, instead of just blindly pulling CVS. I don’t *think* that Bonsai is susceptible to this kind of thing due to the way it groups checkins before reporting them, but please correct me if this is wrong. Also, isn’t this a problem today, since Tinderbox client just blindly picks a timestamp and pulls it?

If I’ve missed or misrepresented anything, please let me know, and check out the dependency tree on bug 401936 for more information.