Diff for "Specs/M/ARMValidationDashboard"

ARMValidationDashboard

Differences between revisions 11 and 12

Launchpad Entry: https://blueprints.launchpad.net/ubuntu/+spec/arm-m-validation-dashboard
Created: PaulLarson
Contributors: PaulLarson, ZygmuntKrynicki
Packages affected:

Summary

As a part of the automated testing efforts on ARM we need a dashboard interface for visualizing the current state of the image. This interface must allow the user to see, at a glance, the state of functional tests as well as performance tests, as well as other useful data that is described in more detail below.

This specification is a part of a larger project, see other blueprints for reference:

Release Note

No user visible changes

Rationale

We need to easily see how various development efforts are affecting the image over time. A dashboard interface helps us to visualize, in one place, the results of running tests on multiple machines. The dashboard can also display results of performance measurements across different image build dates to allow developers quickly see how their efforts are affecting performance. Targets and baselines can be set for any performance metric so that it is possible to detect deviations and track goals.

User stories

Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release. Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified bugs being targeted for the upcoming milestone).
Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold. Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.
Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted. Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure.
Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system. Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.
David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team. David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.

Assumptions

Easy to deploy, including, outside of Canonical. All infrastructure components must be packaged and provided as a PPA, ready to install on a Lucid server. All device components must be packaged and uploaded to Maverick (TODO: which component? are we going straight-to-ppa or do we attempt to hit the main archive?)
Focused on one device.
- FIXME - is this _really_ true? - if not how to avoid multiple devices and benchmarks
  There seems to be a conflict of interests - we'd like to see this for >1 kind of device but vendors will not enjoy any device benchmark information being displayed, especially alongside competing devices
One-way connectivity. Devices participating in the test can connect to the infrastructure services but reverse connection cannot be assumed. (TODO: what about IPv6?)
Distributed environment. Devices and infrastructure components are places in diverse geographical and administrative zones.
Launchpad integration for bug management. There is no plan to support third party bug trackers in the first release.

Design

TODO:

UI design for each core use case (TODO)
- (we want a list of tasks users have to perform to get to the goal they are after (with regard to the use case lists above)
UI design vs UI design of other Canonical technologies (TODO)
Design web pages we need to provide (DONE)

Dashboard will feature the following pages/views:

Project Timeline

Recurring component of each page, shown at the top. Key aspects:

Shows milestones
Shows number of days till next milestone
Shows number of days/weeks? till final milestone
Allows to click at a past day to see historical records

Project timeline could also hold global image/project properties action menu:

Edit (add/remove/modify) test suites and benchmarks

Day Overview

Main view contains daily summary of key aspects influencing upcoming release. This is also the default view for the application. The page contains the following components:

packages
- summary indicators, mostly numbers and links for detail pages)
  - (could be a horizontal bar like in test cases below)
  - total number and details link
  - newly added
  - modified (version change)
  - packages that failed to build
- action links:
  - see package details
test cases
- total tests suites and test cases
- progress indicator: horizontal bar with the following components
  - skipped tests
  - successful tests
  - failed tests
  - pending tests (there is no indicator of a 'running' test)
  - (never ran tests) - this is optional and will be displayed for historic entries, not the current day
- action links:
  - see all tests (details)
  - edit skipped tests [optional]
benchmarks
- selected benchmark results (value + spark line)
  - synthetic benchmarks
    - CPU
    - FPU
    - GPU (if possible)
    - IO:
      - USB thumb drive
      - USB 2.5" HDD
      - SD card
      - Network
      - NAND [optional]
  - end-user / application benchmarks
    - time to boot
    - time to website/cached
    - time to ... (etc)
- notable changes (value, spark line, delta)
  - (things that are not included by default but have changed radically since last test)
- action links:
  - see all benchmarks (details)
  - define selected items
devices
- all devices we have at our disposal
  - percentage of time devoted to:
    - running tests
    - being 'owned' by someone
    - being idle
    - being offline
bugs [optional]
- all bugs filed yesterday that affect this image
  - (could use specific tag, project to detect)

Image Details

This page can be reached from the 'daily overview' pages. It should contain basic information about all packages that were used to build the image. If possible each package should be a link to a launchpad page. For packages that are not tracked by launchpad (PPAs, custom packages) and to packages that are a part of a private PPA no link will be provided.

[Optional]. This view could also provide a diff against any other day. Such difference views could be easy accessible from the project timeline.

Test Suite Details

This page can be reached from test suite list and test day overview (selected suites)

Test suite details provides access to the following components:

Test suite URL: bzr branch that contains this test suite
Description
List of test cases
Summary of historical results (pass/fail/skipped)

Actions:

Disable whole suite for specific hardware
Disable specific tests for specific hardware

Test Case Details

This page can be reached from test suite details.

Test case details provides access to the following components:

Historical results (pass/fail/skip)
Preview of the relevant part of the log file that was harvested to get this result.

Benchmark Details

TODO.

Mostly similar to test suite, except for some presentation differences.

Benchmark probe (single benchmark item) Details

TODO

Mostly similar to test case, except for some presentation differences.

Implementation

Choose basic web technology: DONE (django) Choose database system: DONE (PSQL) Design data model (including data ingest requirements, data presentation requirements, data transformations and storage): TODO Design web widgets/pieces/components that we will need and determine how each fits into the data model: TODO? (is this required in the spec?)

Architecture Overview

The dashboard is a part of a larger set of blueprints/projects that together provide the QA infrastructure. The main components are:

Automatic Test Framework
Test Cases (dealing with providing actual tests)
WebKit testing/benchmarking
Dashboard:
- frontend - web pages, interaction with launchpad, etc
- backend - database model, processes, interactions with other components

Database Model

Make database model image: TODO

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Currently there is no direct migration plan. Things we could consider is migrating some bits and pieces of technology already up and running either at Canonical or somewhere else in the community/other parties that is open source and integrate their tests into our framework. If that becomes true we might want to look at migration from qa-tool-foo to our new technology.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

* Do we roll our own technology or do we adapt existing frameworks (like Hudson) * How and where do we store the user database?

We need at least two types of users: viewers and editors. Editors could alter per-image/per-device state such as which tests should not be run. Editors could add more test suites, news, milestones etc. This could be, to a certain degree, avoided by pulling all non-volatile non-test information from external sources (feeds/rsr).

* How do we provision devices (bind particular instance of the dashboard with a particular device and ensure it can be identified across upgrades/reflashes/etc)? * Where is the test scheduler (is it in this spec or in the automated test framework spec)

BoF agenda and discussion

Goal: define visualization interface for QA control that shows daily/snapshot/other summary of the 'health' of the image for a given platform

Different images based on a common base:

server
desktop
netbookcurrent

Stuff we want to show:

difference from yesterday
performance history
performance regressions
performance targets (as infrastructure to see if it works during the cycle)

Dashboard mockup:

Two columns: 1)
- - Current build status
  - FTBFS count New package count, number of packages Latest build date/time
  - Test result - Build history
2)
- - News - Performance Targets

Q: What about some UI for scheculing test runs: A: we're not targeting this for the first release but we want have a UI for doing that in the future

Q: How does our project relate to other ubuntu QA projects A:

Stuff to check:

buildbot (python) hudson (java)

Action item: check hudson out? Zygmunt Krynicki Hudson instance for Bzr at canonical: http://babune.ladeuil.net:24842/view/Ubuntu/job/selftest-jaunty/buildTimeTrend

We want to store the log file of each test run just in case (for unexpected successes)

CategorySpec

Specs/M/ARMValidationDashboard (last edited 2010-06-04 12:08:03 by fdu90)

-  ⇤ ← Revision 11 as of 2010-05-31 11:20:02 → 
  Size: 17030
  Editor: fdu90
  Comment: Edited user stories after feedback from Scott
+   ← Revision 12 as of 2010-05-31 11:24:52 → ⇥
  Size: 17015
  Editor: fdu90
  Comment: Fixed some formatting issues in numbered lists
-Deletions are marked like this.
+Additions are marked like this.
 Line 30:
-. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release.

Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified  bugs being targeted for the upcoming milestone).

 1. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold.

Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.

 1. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted.

Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure. 

 1. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system.

Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.

 1. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team.

David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.
+. Bob is a release manager for Ubuntu on a particular arm device. Bob wants to check the overall status of the image produced yesterday before releasing Alpha 1. Bob visits the dashboard to check for test failures. Bob marked some tests as expected to fail on this device as not all components are yet in place and some things are still broken. As all other tests have good results Bob can go forward with the release. Bob is a user that will visit the dashboard as a part of his daily routine. He is focused on having most of the data he is interested in being displayed on a single front page. Since he is logged in his homepage contains a summary of the image is is working on. Since Bob visits this page daily he is mostly interested in a difference, or update since yesterday. The website prominently highlights package information (packages that changed, that failed to build, etc), test information (what tests were ran and processed by the system over the last 24 hours, which tests failed, if any), benchmark information (emphasized samples from new measurements, regressions and other deviations from baseline) and bug information (new or modified  bugs being targeted for the upcoming milestone).

 1. Jane is interested in basic performance metrics of current ubuntu image. Jane can check some synthetic benchmarks for CPU, GPU and IO performance. Jane can also check some end-to-end benchmarks for user applications (browser startup time, time to full desktop, time to render snapshot of key websites, etc). Jane can setup baseline for each metric and request to be notified of all variances that exceed given threshold. Jane uses the dashboard rarely, definitely not on a daily basis. Jane is looking for a performance regressions after key packages are changed or added. Jane is also looking at the numbers and graphs more than at anything else. Jane marks milestones such as 'new kernel added', 'gtk sync complete' to add some context to some graphs. Baselines allow her to see how current release performs in comparison to the previous releases. [optionally, if it goes forward] Baselines also allow to see how one distribution compares to other distributions. Jane can easily set up identical devices with Ubuntu, Fedora, and SUSE (or, for some tests, even Windows) and have the data readily available and accessible online.

 1. Alice is on the Desktop QA team and wants to integrate some of the tests her team has created into the dashboard. QA Engineers quickly bootstrap a local installation of the dashboard and check the bundled documentation and examples. Within hours the dashboard displays results from some of the tests that local engineering team has already adapted. Alice sees the dashboard as a free tool that she can take advantage of. Alice and her team of engineers are on track to deliver measurable performance improvements of the Desktop. She is more interested in connecting the tests they have been using so far and to use the dashboard as a good user interface to all the data that they can produce. Alice is also interested in migrating historical records to the dashboard database interface but doing so is an investment she is not yet ready to justify. Alice hopes that additional people would enhance the dashboard, either by adding more valuable tests or by improving the user interface and processing components, thereby allowing her engineers to focus on what is truly important to them and not on the surrounding infrastructure. 

 1. Yung is a product manager in Big Corp ltd. Yung is building a product based on Ubuntu and wants to reuse our QA infrastructure. Yung instructs his engineers to deploy a local dashboard installation and run our tests on their new secret product. Yung's Engineers write an adapter that takes some of the results from the dashboard and pushes them to the internal QA system. Yung is a different type of user. He is not familiar with open source methodologies, technology and infrastructure as much as a regular open source developer or activist would be. Yung was briefed about this technology and how it can be applied to his work process during a business meeting with some 3rd company representatives. Yung is not a big proponent or opponent of open source technologies and merely wants to use them if is can help him do his work. For Yung there is a big factor on ease of deployment, first impression, disruptiveness, localisation (so that engineers can use other languages than english, especially far-east languages ) and privacy. If the technology fails to meet his requirements it will be discarded and not revisited again. Time to market is paramount. If the technology works and is adopted Yung is interested to know about support options.

 1. David is an engineer at SoC Vendor Inc. David uses the QA dashboard to compare performance metrics across a whole line of SoC that are manufactured by his company. David can quickly create custom graphs by adding data series from various measured properties (or probes as the system calls them) and aggregating data sources across time or device type. David can also print them or re-use in a office document he is working on. David saves some of the most often used graphs and shares them with the rest of the team. David is another external user. David is similar to Yung in his desire for adding the value without requiring too much investment but unlike Yung he is primarily an engineer. David can is fine with experiencing minor issues and is more interested in looking under the hood to tweak the application towards his needs. David might be an internal user evaluating this technology before broader acceptance or just doing a local installation for the purpose of his project. David might join an IRC channel or a mailing list to chat with developers or ask questions. He is not interested in formal support.

Ubuntu Wiki