Summary

As a part of the automated testing efforts on ARM we need a dashboard interface for visualizing the current state of the image. This interface must allow the user to see, at a glance, the state of functional tests, performance tests, as well as other useful data that is described in more detail below. Please note that each particular test is beyond the scope of this specification. This specification is only concerned with the infrastructure that allows to deploy a centralized test submission, processing and presentation web application.

This specification is a part of a larger project, see other blueprints for reference:

Release Note

No user visible changes

Rationale

We need to easily see how various development efforts are affecting the image over time. A dashboard interface helps us to visualize, in one place, the results of running tests on multiple machines. The dashboard can also display results of performance measurements across different image build dates to allow developers quickly see how their efforts are affecting performance. Targets and baselines can be set for any performance metric so that it is possible to detect deviations and track goals.

User stories

  1. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release. Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified bugs being targeted for the upcoming milestone).
  2. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold. Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.
  3. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted. Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure.
  4. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system. Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.
  5. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team. David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.

Design

The dashboard has the following core concepts:

Dashboard is designed around the concept of test collections and test collection runs. Typical/expected tests include group of unit tests for a major library, a stand-alone test suite designed to check compliance or correctness of some APIs or existing (including binary-only) application scripted to perform some scenarios. Each test collection run (abbreviated to test run from now on) is performed on specific device (computer). The result of that run is a tree of log files. Log files are uploaded to the 'gateway' component of the dashboard for storage.

The second major concept is log file analysis. Each test collection has a log processing script that is designed to understand the format of the log files and translate them to one of two entities:

All data that is displayed by the dashboard can be traced back to a log file. The system preserves this information for credibility and assistance in manual analysis.

Pass/fail test results are simple to understand, they are an indication of some test that succeeded or failed. The identity of such tests is not maintained. That is, it is not possible to compare two test runs and see if the same pass/fail test succeeded in both automatically. This limitation is made by design.

In contrast performance measurements always need a performance metric to be meaningful. This allows to define metrics in the system, compare them across time, hardware and other factors. Metric also designates the units of each measurements. The units may be time, bytes/second, pixels/second or any other, as required by particular use case.

This decision is based on an assumption that typical qualitative (pass/fail) tests are far more numerous than quantitative tests (benchmarks) and maintaining identity support in the log processors would be an additional effort with little gain.

Implementation

Components

Dashboard is a collection of components that are maintained together. Those components are:

The picture below shows how those components look like together in full deployment scenarios.

Limitations

Ubuntu Package Details

The launch control project is separated into the following packages:

Component Details

Dashboard Web Application

Backend Service

Data Gateway Service

Log Analyzer Service

Database model

TODO

Python APIs

TODO

Command line tools

Migration

Currently there is no direct migration plan. Things we could consider is migrating some bits and pieces of technology already up and running either at Canonical or somewhere else in the community/other parties that is open source and integrate their tests into our framework. If that becomes true we might want to look at migration from qa-tool-foo to our new technology.

BoF agenda and discussion

Goal: define visualization interface for QA control that shows daily/snapshot/other summary of the 'health' of the image for a given platform

Different images based on a common base:

Stuff we want to show:

Dashboard mockup:

Q: What about some UI for scheculing test runs: A: we're not targeting this for the first release but we want have a UI for doing that in the future

Q: How does our project relate to other ubuntu QA projects A:

Stuff to check:

Action item: check hudson out? Zygmunt Krynicki Hudson instance for Bzr at canonical: http://babune.ladeuil.net:24842/view/Ubuntu/job/selftest-jaunty/buildTimeTrend

We want to store the log file of each test run just in case (for unexpected successes)


CategorySpec

Specs/M/ARMValidationDashboard (last edited 2010-06-04 12:08:03 by fdu90)