AutomatedServerTestingSpec
Launchpad Entry: server-lucid-automated-testing
Created: 2009-11-23
Contributors:
Packages affected:
Summary
With our next LTS coming up, this is a great time to focus on stability and QA. As such, we'll work on setting up automated testing of as much as of Ubuntu server as possible. This includes (but is not necessarily limited to) daily runs of the security team's regression test suite, enabling upstreams' test suites at build time, performance regression tests, and automating ISO testing.
Release Note
A considerable amount of time has been spent on setting up automating testing of the server product. It is our hope that this will provide a more solid release of Ubuntu Server than ever before.
Rationale
Regression testing can be a repetitive job. Thankfully, a lot of things can be done automatically. Many packages have test suites that we're not running (for one reason or another), we have qa-regression-tests, and lots of other means for testing things without minimal day to day effort.
User stories
- Soren uploads a new version of dovecot, but is worried he might break compatibility with some IMAP client. However, he sleeps well that evening, knowing that the automatic regression testing suite will raise an alert over night if something broke.
- Jamie notices a bug in libvirt and works on a patch. While test-building the package locally, the test suite informs him that he broke a little-used feature in an obscure corner of the lxc driver in libvirt. He promptly fixes it, the test suite is happy again, and Jamie can proceed to upload the package to Ubuntu.
- Matthias uploads an update to eglibc which triggers a bug in php5. The next morning, the server team receives a report from the automated test system telling them there's a regression in php5. Looking at versions of the packages in the dependency chain from the last succesful test run and this new one, they quickly pinpoint the culprit and start working on a fix.
Assumptions
Design
The goal is to detect as many problems as early as possible.
- Many packages ship test suites.
- Some of these can be run at build time. We should make them do so.
- We should not only run these at the time of upload, but daily as well (to catch bugs introduced by something in the dependency chain).
- Other packages ship test suites that can't easily be run at build time. We should arrange for them to be run daily "somewhere" and somehow get alerted about failures (regressions).
- Some of these can be run at build time. We should make them do so.
- The security team and QA team have a series of tests they use to ensure they don't introduce regressions in stable releases.
- We should use these during development as well. This should be done daily and we should get a report back about failures.
- We want to be alerted about performance regressions as well.
- Automate ISO testing as much as possible.
MathiasGug already has a setup automating much of the ISO testing by using preseeding followed by a script, that logs in over ssh to do the last bits of the test cases). This should ideally be fully automated.
- KVM-autotest is a framework for testing kvm. However, assuming kvm is functional, it's perfect for emulating interactivity, thus allowing us to do end-to-end ISO testing like a normal user typing and clicking.
Implementation
qa-regression-testing scripts
We will integrate the security team's qa-regression-test collection into checkbox, and have it run on a daily basis. Feedback will be collected by the QA team and turned into bugs for the server team to deal with.
Performance testing
The Phoronix test suite seems to be reasonably comprehensive. We will run it on a daily basis and keep an eye on performance regressions. Of course this needs to run on the same hardware every time.
Upstream test suites
A number of server packages are known to provide test suites:
- Postgresql test suite (already runs during the build)
- Puppet has a separate package, puppet-testsuite, which provides the test suite.
- php5 test suite (already runs during the build)
- Apache2
- libvirt
- Includes test suite (not currently run during build, but could be)
TCK http://www.mail-archive.com/libvir-list@redhat.com/msg12703.html
- MySQL test suite (already runs during the build)
- OpenLDAP test suite (already runs during the build)
- CUPS test suite already runs in the build, but has concept of other levels (eg smbtorture).
- samba
- 'make test', seems to requires samba to be built with --enable-socket-wrapper
There's an imap test suite we can use to test dovecot: http://imapwiki.org/ImapTest
The packages that provide a build time test suite will be rebuilt in a PPA every day to catch regressions introduced by things further down the dependency chain.
ISO testing
We should attempt to make MathiasGug's existing ISO testing setup completely automatic. Currently, the install is done using preseeding. Once the install is complete, the operator has to invoke the appropriate script on his client, which then connects to the VM and performs the tests.
We should embrace KVM-autotest and use it for our ISO tests. This involves packaging KVM-autotest and providing socalled step files corresponding to each of the test cases in the ServerWhole list.
BoF agenda and discussion
Automated testing is a great way to prevent/detect regressions.
Security team qa-regression-testing scripts:
- currently integrating in checkbox - aim at 80% - the most easy ones.
- cr3 could run the tests in the data centre
What we want:
- every day a report is generated covering which tests have been run and their results
Running tests in EC2.
Test results reporting:
- leverage checkbox.
- checkbox supports different submission plugin. What to use to track the results and generate reports?
Inclusion in milestone reports presented during the release meeting team.
QA team: easy to run the tests and process the test results internally (black box).
- tests are run and failures are reported as bug by the QA team.
How are test suites updated because of changes in the system? Who?
- QA team finds out about the failure and reports the bug
- QA team fixes the test and writes tests.
What needs testing?
Integration list:
- qa-regression-testing scripts
- enable selected phoronix tests
- upstream test suites
- integrate postgresql test suite
- integrate puppet-testsuite suite package
- integrate dovecot imap test suite (not packaged)
- apache tests (has a framework, use documented in QRT)
- libvirt test suite (not run during build, but could be), also tests in QRT (but not python-unit)
- mysql test suite runs during the build
- openldap test suite runs during the build
- cups test suite runs in the build, but has concept of other levels (eg smbtorture)
- samba
- 'make test', but needs to be built with --enable-socket-wrapper
- smbtorture
- php5
- integrate iso testing tests in checkbox:
- review all the packages on the server CD
- Multi-system environements: documentation.
- pacemaker
- drbd
What sort of testing do we want to perform?
- Stress/performance testing?
- E.g. check if Apache suddenly can handle much fewer requests per second
- than the previous day?
- leverage phoronix test suite?
- E.g. check if Apache suddenly can handle much fewer requests per second
- Functional testing?
- E.g. use different mail clients to talk to a mail server?
- Try a suite of different configuration combinations that we know used to
- work?
- Upgrade testing?
- Do a very fat hardy install (all sorts of different servers, clients, and
- other stuff), and upgrade it to Lucid and see how it breaks?
- Repeat for different configurations? mvo testing infrastructure: only looking at package upgrade failure. How to test that services are working correctly after the upgrade? Marjo to figure it out.
- Do a very fat hardy install (all sorts of different servers, clients, and
- Enabling test suite if they have one
Misc
chat with Steve Beattie on 2010-06-09
2010-06-09T15:04:35 |
hggdh |
sbeattie: so now it is us... |
2010-06-09T15:04:48 |
hggdh |
brb |
2010-06-09T15:05:14 |
sbeattie |
hggdh: no worries, I need a beverage refill. |
2010-06-09T15:12:29 |
hggdh |
sbeattie: I am back |
2010-06-09T15:13:46 |
sbeattie |
hggdh: moi aussi. |
2010-06-09T15:15:35 |
hggdh |
sbeattie: ça va. So... on qa-r-t: you were saying some of the tests are potentially complex/impossible to set up |
2010-06-09T15:16:02 |
sbeattie |
Yes, digging up my notes now. |
2010-06-09T15:17:55 |
sbeattie |
hggdh: here's what I had, last updated around beta 1 or so in lucid: http://paste.ubuntu.com/447390/<<BR>> |
2010-06-09T15:20:09 |
hggdh |
cool. Are they all under checkbox (those committed)? |
2010-06-09T15:20:45 |
sbeattie |
hggdh: committed means I'd committed to a local bzr tree and was awaiting merger into checkbox trunk; I'm updating my checkbox checkout to see if I'd gotten the committed ones merged. |
2010-06-09T15:21:20 |
hggdh |
sbeattie: ah, OK |
2010-06-09T15:25:19 |
hggdh |
sbeattie: another Q -- I see coreutils there. Upstream delivers coreutils with an extensive test suite, which is run everytime we build it |
2010-06-09T15:26:11 |
hggdh |
so, do we need it in qa-r-t? or can we just run a build (say) every day with updated packages? |
2010-06-09T15:26:57 |
sbeattie |
hggdh: heh, our coreutils test is very weak; it's basically an example test of /bin/{true,false} I used in a presentation to demonstrate how to write qa-r-t tests. |
2010-06-09T15:27:09 |
hggdh |
oh, OK |
2010-06-09T15:27:16 |
hggdh |
I had not yet looked at it |
2010-06-09T15:27:29 |
sbeattie |
hggdh: their testsuite is not included in a package? |
2010-06-09T15:27:52 |
sbeattie |
is it run during our coreutils package build? |
2010-06-09T15:27:58 |
hggdh |
sbeattie: no, it is not packaged as coreutils-tests, say. But it is run on every build |
2010-06-09T15:28:23 |
hggdh |
I had a brief look at it, and it is fully immersed into their makefile environment |
2010-06-09T15:29:00 |
hggdh |
also, I remember one of the maintainers stating that the utilities we run some few thousands of times during the tests |
2010-06-09T15:29:23 |
hggdh |
s/we run/were run/ |
2010-06-09T15:30:10 |
sbeattie |
hggdh: I think build-time is sufficient for testing to ensure coreutils is okay; if you're hoping to catch bugs that coreutils depends on (glibc, kernel) then kicking off a frequent/daily rebuild may make sense. |
2010-06-09T15:30:31 |
sbeattie |
(all assuming package build fails if some threshhold of tests fail) |
2010-06-09T15:30:51 |
hggdh |
sbeattie: yes, build fails on a test error (I know, had them myself ;-) |
2010-06-09T15:30:59 |
sbeattie |
hggdh: awesome! |
2010-06-09T15:31:18 |
hggdh |
sbeattie: I will add them on the regression builds we currently do daily |
2010-06-09T15:31:19 |
sbeattie |
okay, looks like cups got merged, you can cross that one off. |
2010-06-09T15:33:21 |
sbeattie |
( http://bazaar.launchpad.net/~checkbox-dev/checkbox/trunk/annotate/head:/jobs/qa_regression.txt.in is the reference for what's been already merged) |
2010-06-09T15:38:06 |
sbeattie |
hggdh: okay, based on review, all the ones that are listed as COMMITTED have been merged and are in fact DONE |
2010-06-09T15:40:00 |
hggdh |
sbeattie: OK. I am updating my local copy of your list with a :1,$s/COMMITTED/DONE/ |
2010-06-09T15:40:38 |
sbeattie |
hggdh: yep, now reviewing the list of tasks you have on the blueprint |
2010-06-09T15:43:11 |
sbeattie |
hggdh: ao cups, cyrus-sasl2, and mysql tasks are already done. |
2010-06-09T15:43:15 |
sbeattie |
s/ao/so/ |
2010-06-09T15:44:27 |
sbeattie |
clamav used to have a need to wait between startup and the tests running, requiring manual intervention; this may have been fixed and needs exploration. |
2010-06-09T15:44:58 |
sbeattie |
fetchmail: don't recall the issues, needs exploration |
2010-06-09T15:46:36 |
sbeattie |
libvirt starts virtual machines (as you might expect); I had passed on that because I was using ESX guests as a testrun environment (to have an accurate idea of the limitations of the test network) |
2010-06-09T15:47:03 |
sbeattie |
... and thus I wasn't going to be able to kick off kvm guests |
2010-06-09T15:47:05 |
hggdh |
and it does not make sense to run libvirt on virt... |
2010-06-09T15:47:24 |
sbeattie |
yeah |
2010-06-09T15:47:30 |
hggdh |
OK. updating the ones done on the blueprint (and crediting you) |
2010-06-09T15:48:06 |
sbeattie |
net-snmp: the test script took arguments of some kind, and thus needs reworking before it can be integrated. |
2010-06-09T15:49:11 |
sbeattie |
apache2: IIRC, the same script was used to test the various flavors of apache (worker, threaded, etc.) and needs some thought before integration can occur. |
2010-06-09T15:49:51 |
sbeattie |
dhcp3: sets up a dhcp server; needs re-work to bind this to a fake interface or somesuch. |
2010-06-09T15:50:39 |
sbeattie |
dnsmasq: my note is unclear to me, needs exploration (sorry) |
2010-06-09T15:50:46 |
hggdh |
heh |
2010-06-09T15:51:26 |
sbeattie |
freeradius: our lucid packages appear to have some breakage. |
2010-06-09T15:52:26 |
sbeattie |
ipsec-tools: needs a setup environment of hosts/networks to test setting up vpns. |
2010-06-09T15:53:51 |
sbeattie |
httpd tests: qa-r-t doesn't have a script named that, not sure if it's a copy/waste error with lighttpd (which is also there) |
2010-06-09T15:54:20 |
sbeattie |
http://bazaar.launchpad.net/~ubuntu-bugcontrol/qa-regression-testing/master/files/head:/scripts/ is the listing of the test scripts |
2010-06-09T15:56:20 |
sbeattie |
libnet-dns-perl: my note isn't helpful, my guess is that errors may have been related to networking restrictions in the datacenter, needs exploration |
2010-06-09T15:56:45 |
hggdh |
sbeattie: those are the ones already integrated, correct? |
2010-06-09T15:58:08 |
sbeattie |
hggdh: the scripts in that directory? Some are, some aren't; the tree was mostly developed by the security team to test their updates and they run them manually on the packages they're working on. |
2010-06-09T15:59:48 |
sbeattie |
our goal here is to run as many of these as we can going forward to catch regressions in the development release/milestones. |
2010-06-09T16:01:30 |
sbeattie |
lighttpd: requires apache is not running, which is tricky if we enable the apache test script, as checkbox installs everything at once, and apache's postinstall starts it up. |
2010-06-09T16:03:06 |
sbeattie |
nagios3: I didn't explore this much because of the existence of nagios1 and nagios2 tests; we could probably get away with just enabling the nagios3 test. Needs exploration. |
2010-06-09T16:03:40 |
sbeattie |
nfs-utils: needs external to the host nfs clients and servers. |
2010-06-09T16:04:15 |
hggdh |
huh... thunderstorm arriving... |
2010-06-09T16:04:56 |
sbeattie |
ntp: needs access to external ntp servers. |
2010-06-09T16:05:21 |
sbeattie |
hggdh: heh, good luck. :-) |
2010-06-09T16:07:35 |
sbeattie |
we don't get many thunderstorms out west, though I heard one rumble this morning; I miss a good thunderstorm. |
2010-06-09T16:09:14 |
sbeattie |
nut: had unknown failures, needs exploration with the test script. Though I don't recall how useful the tests are for systems without a UPS attached. |
2010-06-09T16:10:17 |
sbeattie |
ah, nut has a dummy driver that the test script uses. |
2010-06-09T16:11:27 |
sbeattie |
pptpd: test has some hardcoded networking assumptions that cause failures, I think. |
2010-06-09T16:13:12 |
sbeattie |
python: script needs a little re-working as it takes an argument to specify which version of python (2.4, 2.5, 2.6) to test. |
2010-06-09T16:13:47 |
sbeattie |
ruby: similar issues as python |
2010-06-09T16:14:40 |
sbeattie |
samba: needs working external clients and servers in its environment |
2010-06-09T16:16:16 |
sbeattie |
squid: test requires multiple protocol (http, https, ftp) access to various ubuntu.com hosts. |
2010-06-09T16:16:42 |
sbeattie |
hggdh: I think that covers all the ones on your task list. |
2010-06-09T16:17:24 |
hggdh |
sbeattie: thank you. I am updating the blueprint with your notes (so that we have a reference) |
2010-06-09T16:17:54 |
hggdh |
sbeattie: is python 2.4 still in use? |
2010-06-09T16:18:49 |
sbeattie |
hggdh: looks like it got purged in lucid. |
2010-06-09T16:19:29 |
sbeattie |
(it's in main for dapper, hardy, jaunty, and karmic, which is why the security team cares) |
2010-06-09T16:19:42 |
hggdh |
K, so it stays |
2010-06-09T16:20:10 |
sbeattie |
well, for checkbox integration, we can possibly drop it. |
2010-06-09T16:21:01 |
sbeattie |
and just focus on the "current" supported python. |
2010-06-09T16:21:18 |
sbeattie |
python2.5 also got dropped in lucid, if rmadison is to be believed. |
2010-06-09T16:21:24 |
hggdh |
so, look at 2.6 only right now |
2010-06-09T16:22:01 |
sbeattie |
hggdh: that would be the short-term approach I'd take. |
2010-06-09T16:22:38 |
hggdh |
sbeattie: thank you. I will probably have Qs later on, if you do not mind |
2010-06-09T16:23:25 |
sbeattie |
hggdh: happy to answer what I can. I've been meaning to document this more, both for our internal uses and to encourage community members to contribute testcases. |
2010-06-09T16:27:02 |
sbeattie |
I, heh, do have a work item to add; late in the lucid cycle, zul added a mysql-testsuite which contains upstreams test infrastructure (and, AFAIK, he didn't test it's packaging at all); integrating it into our mysql test script has not made it to the top of my todo list. |
2010-06-09T16:27:27 |
sbeattie |
woo; grammer/english fail. |
2010-06-09T16:27:42 |
hggdh |
sbeattie: heh. I will check with zul |
2010-06-09T16:30:08 |
* |
sbeattie needs to step away for a bit |
2010-06-09T16:30:14 |
-- |
sbeattie is now known as sbeattie-afk |
AutomatedServerTestingSpec (last edited 2010-07-14 15:14:23 by pool-71-252-251-234)