releng

Archived Posts from this Category

making updates easier

Posted by rhelmer on 30 Jul 2008 | Tagged as: automation, mozilla, releng

For a few months now, I’ve been working in my spare time on a way to make configuring and serving updates to Mozilla-based applications easier.

Mozilla updates are MAR files, which are linked to by the Automatic Update Service (aka AUS2). Several tools are involved in the making of updates for production releases, chiefly Patcher, driven by the release automation framework for releases. Nightly updates use a simpler script which automatically determines where builds should be updated to; Patcher needs every update path to be explicitly specified in it’s config file.

Both Patcher and the nightly script call the update-packaging tools to do the work of generating MAR files, which in turn use the “mar” utility (supports tar-like arguments to manipulate MAR files, e.g. “mar -t file.mar”, “mar -x file.mar”, etc.) and the “mbsdiff” utility, which generates binary patches using a modified version of bsdiff.

The update-packaging tools are in need of a makeover too, but that is a story for another day.

Getting back to how updates are served – Patcher’s other job is to generate thousands of text files, which are used to configure AUS. Every possible update path, like this one for 3.0b3, is actually generated dynamically from two text files (partial.txt and complete.txt) which reside in a directory layout that is similar, but in a slightly different order, than the information in that URL (…/product/version/buildid/buildTarget/locale/channel/update.xml). These complete.txt and partial.txt files have gone through two revisions in their file format, in the first variables for the generated XML like updateType, URL to the MAR file, etc. are on a specific line number. In the second (“version=1″), key/value pairs are used.

AUS2 configuration files only reflect the current state of the system; for releases the history is in Patcher config files (Config::General). The release automation scripts automatically update and check this file into CVS, so it’s not too painful to deal with in most situations. There are some outstanding bugs but overall it does what it is supposed to do.

However, it took me a very long time to get a handle on the above, and I think the separation between Patcher and the AUS server is not very useful. In fact, the method of explicit updates for all is downright unhelpful; every single release (e.g. 2.0.0.15), the following happens:

  1. partial updates are generated from 2.0.0.14->2.0.0.15
  2. every previous release (2.0.0.[1,2,3,4,...]) is pointed to the same 2.0.0.15 update

That means generating and publishing two text files for each (release * platform * locale) combination, which all contain exactly the same data. Also I think that taking a hint from the way the nightly system works would be useful here; 2.x should automatically point to the latest *unless* explicitly overridden, it should not require explicit configuration to do the norm. Finally, the nightly and production system should not be so different; every nightly update is a lost opportunity to test pre-releases of the production system, and having forked systems is bad for bugfixing and feature porting (note that there are no nightly updates for locales other than en-US, for example).

So, I’ve been thinking for a long time about how to make tools that are easier to use, understand and extend. One idea is to have the AUS server configuration be a database, not a giant tree of text files, and have the data in one place (not stored in a config file which is expanded to a giant tree of text files by a separate app). Another is to provide a simple API, and a few command line tools which use this API to modify update data and export it.

The conceptual model right now is that each release contains one update, which contains two patches (one partial, one complete). Both the database schema and the API reinforce this model.

Here’s what I have working so far. In case it’s not obvious, this is most definitely an early “throw the first one away” prototype:

The schema is based on Lars’ fine work on the subject, although I did modify it slightly. This schema is not totally done yet either, for example foreign keys aren’t actually hooked up, but there’s enough there to see that it works. There’s a run.py command in that directory that calls the importer and exporter correctly.

This means that you can read existing AUS2 data into a database (if you have it), and create or manipulate update information using the API from Python (or directly with SQL, if you like). You can generate update.xml files and put them straight onto a webserver.

What I’ve put together needs quite a lot more work, but I wanted to open it up for comment. Here’s what I think is remaining, at least:

  • database should hold the history of updates, not just the current state
  • need a web service which talks directly to the database, as an alternative to pre-generating all update.xml files.
  • should use existing libs for the DB ORM (SQLAlchemy maybe?), generating XML, etc. not the home-grown things I threw together
  • I think it would be advantageous to make the model/schema/API more sophisticated and normalized (e.g. updates could belong in multiple channels), but I don’t want to go beyond the essentials quite yet.
  • the new update-packaging tools should be able to read data from this system in order to automatically determine the appropriate “from” release to base partial MARs on, and also there should be some way to register that new updates are available, that access would be internal and append-only (e.g. only needs SELECT, INSERT).

I think that to solve the first, update paths should be explicitly configured once, but there needs to be business logic in the server app (or update.xml file generator) which overrides this when a newer release is available. For instance, if a user is on version 1.0 and version 1.1 is available which has a partial for 1.0, then the partial 1.0->1.1 should be served. However, if version 1.2 is available, then the complete 1.0->1.2 update should be served.

The second problem has more to do with the burden inherent in handling tens of thousands of text files (e.g. backing them up or restoring them can take a very long time), although I believe that it is useful to have the option to pregenerate the path/update.xml files, especially for people without so many updates as mozilla.org is pushing each release.

Anyway, comments welcome! Certainly feel free to nudge me if it looks like I’m going off the rails here, but I think this approach could make things a little better in update-land. I’ll take patches too, but if anything serious comes of this I’ll probably clean up and move over to Mozilla’s repo, and rewrite a bunch, so don’t take the current implementation too seriously..

releases on tap

Posted by rhelmer on 10 Jul 2008 | Tagged as: automation, mozilla, releng

One of the things that was pounded into me while working at MoCo is the idea of having a bug tracker and using it. I literally can’t work without one anymore. It’s the first thing I really pushed for at my new job (they were using various ad-hoc systems for project management, but not a real bug tracker for the software dev side). I’ve realized that I just can’t keep everything in my head, various notepads and text files, etc. and expect to get anything done, or let anyone know what my priorities are.

In return, I really tried to hammer in the idea of fast, automated release cycles. We spent a lot of time (and the release engineering team does still spend a lot of time) wrapping the build system and other tools so that they can be run and the output verified automatically, chasing that ideal of the Formula One-style hand-off to QA and to the users.

The way releases work now is incredible, just night and day from when I started at MoCo a little over two years ago. However, there’s one thing that’s always bugged me, and since I just had the opportunity to set up an automated build/release environment, I thought I’d expound a little bit on it.

The one thing is that nightly builds of Firefox just aren’t the same as the release builds. The way updates work is different, branding is turned on, bits are signed (on Windows), the directory structure for files is different. Firefox releases are actually rebuilt from source for each release.

So what? None of these, even added up, are a big deal, right? Obviously releases work fine, and there are a ton of great people (and the tools they’ve made) that make sure that nothing is missed because of this. But wouldn’t it be great if we could just take the nightly updates and builds that have already been put through the ringer by thousands of people, and give those straight to QA? Or if we can’t have that, how about at least have the release builds put through the same tests and available to QA immediately after checkin?

Am I pushing some fanciful, architecture-astronaut utopian vision? I don’t think so, because this is how I’ve done releases in the past, and this is how I do releases now. Let me tell you about it.

I use Hudson, which I can’t recommend highly enough (well, if you’re not allergic to Java, I guess). It makes this kind of process easy. It’s not necessary to use it to achieve this of course, I’m just throwing this out as a data point.

On each checkin:

  • a unique build number is generated
  • a new build is generated (I also have it run unit tests, and install the software to run functional tests)
  • release files and other artifacts like build logs are archived, and checksums of the files are stored
  • if anything goes wrong, the team and the developer who checked in the latest change are notified

The software is available to QA as soon as this automated process is complete. When it’s time to release, I can tag the build via the web UI (although it’s easy enough to do outside of Hudson if you have the build number, which in turn contains the branch/datestamp/revision info needed).

Having the next release always “on tap” makes it easy for me to largely ignore the build/release side of things, and focus on developing software, writing tests, and tracking down problems.

Now, Mozilla’s situation is way more complicated, which I alluded to a bit earlier. This post isn’t a “see what I can do!” rant as much as a “look what’s possible!” idea. I think that this kind of setup is totally doable for Mozilla’s products, but there are some serious issues:

  • branding is turned on at compile time. having nightly builds not called “Firefox” is a *good* thing, as otherwise end-users would be very confused.
  • “–enable-tests”, needed for unit tests, cannot be run in release builds at the moment (for technical reasons outside the scope of this post; I’m sure there are bugs on this)
  • release builds are signed and have a different filename format and directory structure (e.g. “firefox-3.0.pre.en-US.win32.installer.exe” for nightly versus “3.0/win32/en-US/Firefox Setup 3.0.exe”)
  • release builds are cryptographically signed, to assure users that these files really were created by MoCo (regardless of what mirror or download site they may have come from).
  • nightly updates are only for en-US, and use a different set of tools to generate updates, and a different mode of the update server to serve updates (some ideas for fixing this problems are in bug 410806, but again this is outside of the scope of this post)

So all of these are pretty much good things (branding, signing, etc.) or technical issues that could surely be fixed (nightly updates, unit tests). Arguably, nightly users and release users tend to be very different people, with very different needs and expectations, so all of the “intentional problems” here are really good things. This pretty much eliminates the possibility (as far as I can see) that Firefox release engineers could take a nightly build and be able to ship that as a release build.

Even if the branding issue were solved (e.g. repackaging), signing still needs to be done, partial diff files would need to be regenerated, and probably other things that I’m overlooking. The automated tests that were run on the nightlies may not be applicable (you may scoff at the paranoia, but there was a bug regarding the size of the Vista icon in official branding found late in the Fx3 beta cycle which caused a bunch of grief. This situation was improved by making a Minefield version of the same icon, which is a good fix, but I think my point still stands).

Here’s another option – why not create a real, honest-to-god Firefox release build, on each checkin (or at least alongside each nightly build)? This at least makes it available to QA as soon as humanly possible, and it could probably be opened up somehow to interested community testers (human-triggered builds are right now, just put into a special area).

Maybe I’m just spoiled working on little tiny projects, but I think even the already super-fast and extensively tested Firefox releases could be made super-faster and the tests extensivlier, at the cost of freeing up the release engineers of the need to babysit the One and Final Release Build.

openSUSE build service

Posted by rhelmer on 09 Jul 2008 | Tagged as: automation, releng

Looks like the openSUSE build service can package up your software for a bunch of different Linux distributions, cross-compile, track upstream project dependencies (e.g. rebuild your GTK app when GTK changes), and runs on their servers so you don’t have to maintain the thing.

Add in Windows and Mac support and they might have something there :P

This might be a great idea for a higher-level “cloud computing” service, setting up and maintaining this kind of infrastructure is a huge problem for many companies.

OS as platform

Posted by rhelmer on 08 Jul 2008 | Tagged as: apple, microsoft, releng, ubuntu

I’ve been thinking a lot about the role of operating systems lately. Why is there no operating system vendor that focuses on being a platform for applications, rather than trying to compete directly in the application space?

Maybe this is a naive question, but it really makes application developer’s lives a huge pain to have to compete with platform vendors all the time, and it’s surprising to me that the market puts up with it. It also brings up the whole “core competency” argument, can one company really do two fairly specialized things well?

These are who I consider to be the top-tier OS vendors:

  • Microsoft Windows
  • Apple Mac OS X
  • Ubuntu Linux

Why don’t any of them provide just the base OS + gotta-have applications (editor, email, web), and give the ISVs the ability to:

  • register new applications in a central catalog
  • deliver updates to specific applications
  • send crash data back to the vendor

This would allow the OS vendor to focus on the core OS functionality, and provide means to the users to select applications that suited their needs (shipping preinstalled with the top editor, email and web clients, of course). Having formal reviewers as well as user ratings would be a great way to promote good and trustworthy applications.

I don’t anticipate any of these top-tier OS vendors focusing on this space, although for different reasons.


Continue Reading »

rel-o-mation slideware!

Posted by rhelmer on 04 Apr 2008 | Tagged as: automation, mozilla, releng

I put this set of slides together to explain what state the release automation project is in. It probably makes more sense when I am sitting there to explain what each point means, but I figured I’d put it out there anyway :)

The current setup mimics ye olde manual release process, forged by Chase. Over the past few years we’ve worked on wrapping that process in scripts with this perl framework (aka “bootstrap”), which auto-generates configs for underlying systems like tinderbox and patcher, checks logs for errors, etc.

A lot of the current bugs come from underlying systems, especially the tinderbox client. Reducing some of the complexity here would both make the system more understandable and most likely faster as well. It’s pretty tough to make changes when you’re doing this level of wrapping, too.

Now that everything is driven by Buildbot, it probably makes the most sense to call the build system directly, instead of buildbot->bootstrap->tinderbox->build_system that we have today. There are bugs on all of this already, hopefully the slides and this post will make it clearer how they tie together.

moving 1.8 nightlies to release machines March 5 2008

Posted by rhelmer on 04 Mar 2008 | Tagged as: automation, buildbot, mozilla, releng, tinderbox

As previously announced on Tinderbox and planet, we’re migrating nightly production to running on the same machines as release production.

On the moz1.8 branch, we’ve been running the new nightlies in parallel with the “traditional” nightlies since Feb 15 2008, and are going to switchover live tomorrow.

The new machines:
* production-pacifica-vm
* production-prometheus-vm
* bm-xserve05

The old machines:
* pacifica-vm
* prometheus-vm
* bm-xserve02

Starting tomorrow, the performance machines will begin following the new machines. The new machines will publish updates and nightly builds to the usual location, and the old machines will be disabled (but kept around for a while, just in case).

If there is a reason that we should not proceed, or if you see any problems after the migration, please update bug 417147 or email build@mozilla.org.

Thanks!
Rob

tinderbox to buildbot: moz18 branch

Posted by rhelmer on 19 Dec 2007 | Tagged as: automation, buildbot, mozilla, releng, tinderbox

I’ve set up the release automation staging server for the Mozilla 1.8 branch (Firefox 2.x) to also generate nightly builds and depend builds on checkin to the branch (using buildbot’s BonsaiPoller). I outlined some of the advantages to this release automation/nightly+depend integration in my previous post.

You can see the results on the Mozilla1.8-Staging Tinderbox tree

The main impediment to taking this live are the performance test machines. These machines currently only cycle when a new build is available, but ideally we’d want them to keep re-testing the same build as many times as possible, to get more stable test results. Since the Tinderbox-driven depend builds currently continously cycle instead of waiting for checkin, we tend to get several test builds for the same source code as a side effect.

These machines forge their start time to match that of the build it came from which allows for easily matching up checkins and build times to performance results, but this doesn’t really make sense if we’re doing multiple test runs per build.

I’ve started a thread in the mozilla.dev.builds newsgroup with the subject “moving Mozilla1.8 tinderboxes to Buildbot” for general discussion about this idea.

tinderbox to buildbot, step 1

Posted by rhelmer on 06 Dec 2007 | Tagged as: automation, buildbot, mozilla, releng

As I mentioned previously, I’ve been working on incrementally moving our Tinderbox client installs over to Buildbot.

The first step is to switch from driving Tinderbox from Buildbot and our release automation system, instead of being driven on each machine from the multi-tinderbox script. The release automation still calls Tinderbox client underneath, so features like CLOBBER support and all of your other favorites remain.

I have the staging automation builders publishing to the Tinderbox MozillaStage tree. Note that it’s using Mozilla1.8 builders but firing off builds when it sees trunk checkins; this is because I want to make sure the Bonsai polling is working and Mozilla1.8 doesn’t get very many checkins :) Also, I’m trying to stay out of the way of the ongoing trunk automation work that bhearsum is driving (AKA, letting him find and fix the trunk+release_automation bugs before I add nightly support). Expect to see trunk nightlies up there in the next few weeks.

This has several advantages right off the bat:

  • same release process we use for production (currently a very small subset
  • only build on checkin, should help cycle times
  • same builders used for nightly and production releases (admittedly, this is how it used to be before release automation; but now we can let Buildbot handle the queuing instead of either interrupting depend/nightly builds or running multiple builds on the same machine, which is slow)

As we continue to make the nightly and final release process more alike, we can start taking advantage of things like that only final releases have but are missing on nightlies:

  • updates for l10n (only en-US gets updates currently :( )
  • update verification
  • publishing the source tarball used to buld
  • using the same timestamp for all platforms

On the administration side, it should let us manage builders centrally, more quickly and easily deploy new builders, and with a little more work will let us parallelize builds (multiple build machines per column, or buildslaves per builder in Buildbot parlance), which should further help cycle time (no having to wait for the current build to finish to get a build started with your fresh checkin).

Comments/questions/concerns welcome! Feel free to email me, the build group, or post in mozilla.dev.builds newsgroup if you’d like to discuss further.

Towards human-free releases

Posted by rhelmer on 01 Sep 2007 | Tagged as: automation, buildbot, mozilla, releng

We took a big step towards truly hands-off releases by doing a (very early) Firefox 2.0.0.7 RC1 with the Buildbot-enabled release automation system. There are still some kinks to work out, but overall things are looking great.

The elapsed machine time from “code freeze” to “ready to ship” was ~15 hours, actual time was +12h or so waiting for someone to do the signing. This does not include time for QA, but a lot of that can be interleaved, and hopefully further automated for maintenance releases (as they generally include no new features).

I know that we’re already very good (10FD ftw), but I know we can do better. Imagine with me, if you will, that we had a timeline like this:

Day 1 – security exploit announced

Day 2- RC available

Day 3 – fix available on auto-update

Are there any other software vendors that ship security fixes to a locally-installed application on such a compressed schedule? I’d really like to know; please leave me a comment or email me privately if it’s sensitive. I’d love to be able to measure how we’re doing, and that’s tough without knowing how others measure this.

On a more general note, I think that release automation software should become a commodity just as web servers, continuous integration systems, etc. have. If you want to help out or just see what we’re doing, check out the Mozilla release automation docs.

pretty pictures

Posted by rhelmer on 25 May 2007 | Tagged as: automation, mozilla, releng

preed always makes fun of my release process diagram (as seen on whiteboards everywhere):
release process

So I made a fancier one, showing the inside of each of these steps:
step

However, I still feel that preed doesn’t appreciate it, so I dedicate this diagram to morgamic.

Next Page »