AutomatedServerTestingSpec

Launchpad Entry: server-lucid-automated-testing
Created: 2009-11-23
Contributors:
Packages affected:

Summary

With our next LTS coming up, this is a great time to focus on stability and QA. As such, we'll work on setting up automated testing of as much as of Ubuntu server as possible. This includes (but is not necessarily limited to) daily runs of the security team's regression test suite, enabling upstreams' test suites at build time, performance regression tests, and automating ISO testing.

Release Note

A considerable amount of time has been spent on setting up automating testing of the server product. It is our hope that this will provide a more solid release of Ubuntu Server than ever before.

Rationale

Regression testing can be a repetitive job. Thankfully, a lot of things can be done automatically. Many packages have test suites that we're not running (for one reason or another), we have qa-regression-tests, and lots of other means for testing things without minimal day to day effort.

User stories

Soren uploads a new version of dovecot, but is worried he might break compatibility with some IMAP client. However, he sleeps well that evening, knowing that the automatic regression testing suite will raise an alert over night if something broke.
Jamie notices a bug in libvirt and works on a patch. While test-building the package locally, the test suite informs him that he broke a little-used feature in an obscure corner of the lxc driver in libvirt. He promptly fixes it, the test suite is happy again, and Jamie can proceed to upload the package to Ubuntu.
Matthias uploads an update to eglibc which triggers a bug in php5. The next morning, the server team receives a report from the automated test system telling them there's a regression in php5. Looking at versions of the packages in the dependency chain from the last succesful test run and this new one, they quickly pinpoint the culprit and start working on a fix.

Assumptions

Design

The goal is to detect as many problems as early as possible.

Many packages ship test suites.
- Some of these can be run at build time. We should make them do so.
  - We should not only run these at the time of upload, but daily as well (to catch bugs introduced by something in the dependency chain).
- Other packages ship test suites that can't easily be run at build time. We should arrange for them to be run daily "somewhere" and somehow get alerted about failures (regressions).
The security team and QA team have a series of tests they use to ensure they don't introduce regressions in stable releases.
- We should use these during development as well. This should be done daily and we should get a report back about failures.
We want to be alerted about performance regressions as well.
Automate ISO testing as much as possible.
- MathiasGug already has a setup automating much of the ISO testing by using preseeding followed by a script, that logs in over ssh to do the last bits of the test cases). This should ideally be fully automated.
- KVM-autotest is a framework for testing kvm. However, assuming kvm is functional, it's perfect for emulating interactivity, thus allowing us to do end-to-end ISO testing like a normal user typing and clicking.

Implementation

qa-regression-testing scripts

We will integrate the security team's qa-regression-test collection into checkbox, and have it run on a daily basis. Feedback will be collected by the QA team and turned into bugs for the server team to deal with.

Performance testing

The Phoronix test suite seems to be reasonably comprehensive. We will run it on a daily basis and keep an eye on performance regressions. Of course this needs to run on the same hardware every time.

Upstream test suites

A number of server packages are known to provide test suites:

Postgresql test suite (already runs during the build)
Puppet has a separate package, puppet-testsuite, which provides the test suite.
php5 test suite (already runs during the build)
Apache2
libvirt
- Includes test suite (not currently run during build, but could be)
- TCK http://www.mail-archive.com/libvir-list@redhat.com/msg12703.html
MySQL test suite (already runs during the build)
OpenLDAP test suite (already runs during the build)
CUPS test suite already runs in the build, but has concept of other levels (eg smbtorture).
samba
- 'make test', seems to requires samba to be built with --enable-socket-wrapper
There's an imap test suite we can use to test dovecot: http://imapwiki.org/ImapTest

The packages that provide a build time test suite will be rebuilt in a PPA every day to catch regressions introduced by things further down the dependency chain.

ISO testing

We should attempt to make MathiasGug's existing ISO testing setup completely automatic. Currently, the install is done using preseeding. Once the install is complete, the operator has to invoke the appropriate script on his client, which then connects to the VM and performs the tests.
We should embrace KVM-autotest and use it for our ISO tests. This involves packaging KVM-autotest and providing socalled step files corresponding to each of the test cases in the ServerWhole list.

BoF agenda and discussion

Automated testing is a great way to prevent/detect regressions.

Security team qa-regression-testing scripts:

currently integrating in checkbox - aim at 80% - the most easy ones.
cr3 could run the tests in the data centre

What we want:

every day a report is generated covering which tests have been run and their results

Running tests in EC2.

Test results reporting:

leverage checkbox.
checkbox supports different submission plugin. What to use to track the results and generate reports?

Inclusion in milestone reports presented during the release meeting team.

QA team: easy to run the tests and process the test results internally (black box).

tests are run and failures are reported as bug by the QA team.

How are test suites updated because of changes in the system? Who?

QA team finds out about the failure and reports the bug
QA team fixes the test and writes tests.

What needs testing?

Integration list:

qa-regression-testing scripts
1. enable selected phoronix tests
upstream test suites
- integrate postgresql test suite
- integrate puppet-testsuite suite package
- integrate dovecot imap test suite (not packaged)
- apache tests (has a framework, use documented in QRT)
- libvirt test suite (not run during build, but could be), also tests in QRT (but not python-unit)
- mysql test suite runs during the build
- openldap test suite runs during the build
- cups test suite runs in the build, but has concept of other levels (eg smbtorture)
- samba
  - 'make test', but needs to be built with --enable-socket-wrapper
  - smbtorture
- php5
integrate iso testing tests in checkbox:
- ServerWhole
review all the packages on the server CD
Multi-system environements: documentation.
- pacemaker
- drbd

What sort of testing do we want to perform?

Stress/performance testing?
- E.g. check if Apache suddenly can handle much fewer requests per second
  - than the previous day?
- leverage phoronix test suite?
Functional testing?
- E.g. use different mail clients to talk to a mail server?
- Try a suite of different configuration combinations that we know used to
  - work?
Upgrade testing?
- Do a very fat hardy install (all sorts of different servers, clients, and
  - other stuff), and upgrade it to Lucid and see how it breaks?
- Repeat for different configurations? mvo testing infrastructure: only looking at package upgrade failure. How to test that services are working correctly after the upgrade? Marjo to figure it out.
Enabling test suite if they have one

Misc

chat with Steve Beattie on 2010-06-09

2010-06-09T15:04:35	hggdh	sbeattie: so now it is us...
2010-06-09T15:04:48	hggdh	brb
2010-06-09T15:05:14	sbeattie	hggdh: no worries, I need a beverage refill.
2010-06-09T15:12:29	hggdh	sbeattie: I am back
2010-06-09T15:13:46	sbeattie	hggdh: moi aussi.
2010-06-09T15:15:35	hggdh	sbeattie: ça va. So... on qa-r-t: you were saying some of the tests are potentially complex/impossible to set up
2010-06-09T15:16:02	sbeattie	Yes, digging up my notes now.
2010-06-09T15:17:55	sbeattie	hggdh: here's what I had, last updated around beta 1 or so in lucid: http://paste.ubuntu.com/447390/<<BR>>
2010-06-09T15:20:09	hggdh	cool. Are they all under checkbox (those committed)?
2010-06-09T15:20:45	sbeattie	hggdh: committed means I'd committed to a local bzr tree and was awaiting merger into checkbox trunk; I'm updating my checkbox checkout to see if I'd gotten the committed ones merged.
2010-06-09T15:21:20	hggdh	sbeattie: ah, OK
2010-06-09T15:25:19	hggdh	sbeattie: another Q -- I see coreutils there. Upstream delivers coreutils with an extensive test suite, which is run everytime we build it
2010-06-09T15:26:11	hggdh	so, do we need it in qa-r-t? or can we just run a build (say) every day with updated packages?
2010-06-09T15:26:57	sbeattie	hggdh: heh, our coreutils test is very weak; it's basically an example test of /bin/{true,false} I used in a presentation to demonstrate how to write qa-r-t tests.
2010-06-09T15:27:09	hggdh	oh, OK
2010-06-09T15:27:16	hggdh	I had not yet looked at it
2010-06-09T15:27:29	sbeattie	hggdh: their testsuite is not included in a package?
2010-06-09T15:27:52	sbeattie	is it run during our coreutils package build?
2010-06-09T15:27:58	hggdh	sbeattie: no, it is not packaged as coreutils-tests, say. But it is run on every build
2010-06-09T15:28:23	hggdh	I had a brief look at it, and it is fully immersed into their makefile environment
2010-06-09T15:29:00	hggdh	also, I remember one of the maintainers stating that the utilities we run some few thousands of times during the tests
2010-06-09T15:29:23	hggdh	s/we run/were run/
2010-06-09T15:30:10	sbeattie	hggdh: I think build-time is sufficient for testing to ensure coreutils is okay; if you're hoping to catch bugs that coreutils depends on (glibc, kernel) then kicking off a frequent/daily rebuild may make sense.
2010-06-09T15:30:31	sbeattie	(all assuming package build fails if some threshhold of tests fail)
2010-06-09T15:30:51	hggdh	sbeattie: yes, build fails on a test error (I know, had them myself ;-)
2010-06-09T15:30:59	sbeattie	hggdh: awesome!
2010-06-09T15:31:18	hggdh	sbeattie: I will add them on the regression builds we currently do daily
2010-06-09T15:31:19	sbeattie	okay, looks like cups got merged, you can cross that one off.
2010-06-09T15:33:21	sbeattie	( http://bazaar.launchpad.net/~checkbox-dev/checkbox/trunk/annotate/head:/jobs/qa_regression.txt.in is the reference for what's been already merged)
2010-06-09T15:38:06	sbeattie	hggdh: okay, based on review, all the ones that are listed as COMMITTED have been merged and are in fact DONE
2010-06-09T15:40:00	hggdh	sbeattie: OK. I am updating my local copy of your list with a :1,$s/COMMITTED/DONE/
2010-06-09T15:40:38	sbeattie	hggdh: yep, now reviewing the list of tasks you have on the blueprint
2010-06-09T15:43:11	sbeattie	hggdh: ao cups, cyrus-sasl2, and mysql tasks are already done.
2010-06-09T15:43:15	sbeattie	s/ao/so/
2010-06-09T15:44:27	sbeattie	clamav used to have a need to wait between startup and the tests running, requiring manual intervention; this may have been fixed and needs exploration.
2010-06-09T15:44:58	sbeattie	fetchmail: don't recall the issues, needs exploration
2010-06-09T15:46:36	sbeattie	libvirt starts virtual machines (as you might expect); I had passed on that because I was using ESX guests as a testrun environment (to have an accurate idea of the limitations of the test network)
2010-06-09T15:47:03	sbeattie	... and thus I wasn't going to be able to kick off kvm guests
2010-06-09T15:47:05	hggdh	and it does not make sense to run libvirt on virt...
2010-06-09T15:47:24	sbeattie	yeah
2010-06-09T15:47:30	hggdh	OK. updating the ones done on the blueprint (and crediting you)
2010-06-09T15:48:06	sbeattie	net-snmp: the test script took arguments of some kind, and thus needs reworking before it can be integrated.
2010-06-09T15:49:11	sbeattie	apache2: IIRC, the same script was used to test the various flavors of apache (worker, threaded, etc.) and needs some thought before integration can occur.
2010-06-09T15:49:51	sbeattie	dhcp3: sets up a dhcp server; needs re-work to bind this to a fake interface or somesuch.
2010-06-09T15:50:39	sbeattie	dnsmasq: my note is unclear to me, needs exploration (sorry)
2010-06-09T15:50:46	hggdh	heh
2010-06-09T15:51:26	sbeattie	freeradius: our lucid packages appear to have some breakage.
2010-06-09T15:52:26	sbeattie	ipsec-tools: needs a setup environment of hosts/networks to test setting up vpns.
2010-06-09T15:53:51	sbeattie	httpd tests: qa-r-t doesn't have a script named that, not sure if it's a copy/waste error with lighttpd (which is also there)
2010-06-09T15:54:20	sbeattie	http://bazaar.launchpad.net/~ubuntu-bugcontrol/qa-regression-testing/master/files/head:/scripts/ is the listing of the test scripts
2010-06-09T15:56:20	sbeattie	libnet-dns-perl: my note isn't helpful, my guess is that errors may have been related to networking restrictions in the datacenter, needs exploration
2010-06-09T15:56:45	hggdh	sbeattie: those are the ones already integrated, correct?
2010-06-09T15:58:08	sbeattie	hggdh: the scripts in that directory? Some are, some aren't; the tree was mostly developed by the security team to test their updates and they run them manually on the packages they're working on.
2010-06-09T15:59:48	sbeattie	our goal here is to run as many of these as we can going forward to catch regressions in the development release/milestones.
2010-06-09T16:01:30	sbeattie	lighttpd: requires apache is not running, which is tricky if we enable the apache test script, as checkbox installs everything at once, and apache's postinstall starts it up.
2010-06-09T16:03:06	sbeattie	nagios3: I didn't explore this much because of the existence of nagios1 and nagios2 tests; we could probably get away with just enabling the nagios3 test. Needs exploration.
2010-06-09T16:03:40	sbeattie	nfs-utils: needs external to the host nfs clients and servers.
2010-06-09T16:04:15	hggdh	huh... thunderstorm arriving...
2010-06-09T16:04:56	sbeattie	ntp: needs access to external ntp servers.
2010-06-09T16:05:21	sbeattie	hggdh: heh, good luck. :-)
2010-06-09T16:07:35	sbeattie	we don't get many thunderstorms out west, though I heard one rumble this morning; I miss a good thunderstorm.
2010-06-09T16:09:14	sbeattie	nut: had unknown failures, needs exploration with the test script. Though I don't recall how useful the tests are for systems without a UPS attached.
2010-06-09T16:10:17	sbeattie	ah, nut has a dummy driver that the test script uses.
2010-06-09T16:11:27	sbeattie	pptpd: test has some hardcoded networking assumptions that cause failures, I think.
2010-06-09T16:13:12	sbeattie	python: script needs a little re-working as it takes an argument to specify which version of python (2.4, 2.5, 2.6) to test.
2010-06-09T16:13:47	sbeattie	ruby: similar issues as python
2010-06-09T16:14:40	sbeattie	samba: needs working external clients and servers in its environment
2010-06-09T16:16:16	sbeattie	squid: test requires multiple protocol (http, https, ftp) access to various ubuntu.com hosts.
2010-06-09T16:16:42	sbeattie	hggdh: I think that covers all the ones on your task list.
2010-06-09T16:17:24	hggdh	sbeattie: thank you. I am updating the blueprint with your notes (so that we have a reference)
2010-06-09T16:17:54	hggdh	sbeattie: is python 2.4 still in use?
2010-06-09T16:18:49	sbeattie	hggdh: looks like it got purged in lucid.
2010-06-09T16:19:29	sbeattie	(it's in main for dapper, hardy, jaunty, and karmic, which is why the security team cares)
2010-06-09T16:19:42	hggdh	K, so it stays
2010-06-09T16:20:10	sbeattie	well, for checkbox integration, we can possibly drop it.
2010-06-09T16:21:01	sbeattie	and just focus on the "current" supported python.
2010-06-09T16:21:18	sbeattie	python2.5 also got dropped in lucid, if rmadison is to be believed.
2010-06-09T16:21:24	hggdh	so, look at 2.6 only right now
2010-06-09T16:22:01	sbeattie	hggdh: that would be the short-term approach I'd take.
2010-06-09T16:22:38	hggdh	sbeattie: thank you. I will probably have Qs later on, if you do not mind
2010-06-09T16:23:25	sbeattie	hggdh: happy to answer what I can. I've been meaning to document this more, both for our internal uses and to encourage community members to contribute testcases.
2010-06-09T16:27:02	sbeattie	I, heh, do have a work item to add; late in the lucid cycle, zul added a mysql-testsuite which contains upstreams test infrastructure (and, AFAIK, he didn't test it's packaging at all); integrating it into our mysql test script has not made it to the top of my todo list.
2010-06-09T16:27:27	sbeattie	woo; grammer/english fail.
2010-06-09T16:27:42	hggdh	sbeattie: heh. I will check with zul
2010-06-09T16:30:08	*	sbeattie needs to step away for a bit
2010-06-09T16:30:14	--	sbeattie is now known as sbeattie-afk

CategorySpec

AutomatedServerTestingSpec (last edited 2010-07-14 15:14:23 by pool-71-252-251-234)