KernelLucidSuspendResumeImprovements

Differences between revisions 1 and 2
Revision 1 as of 2009-11-19 21:32:46
Size: 5614
Editor: 63
Comment:
Revision 2 as of 2009-11-20 06:57:01
Size: 4370
Editor: 63
Comment:
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
Suspend/Resume works very well in Karmic. On our benchmark hardware, Dell Mini10V with SSD for storage, I am able to suspend in .8s and resume in 1.6s on an average. We would like to improve Suspend/Resume experience in Lucid. Major pain points in improving suspend/resume are , proprietary drivers, staging drivers etc.

The aim in karmic is
 * identify drivers that take long time to suspend/resume
 * analyze bugs on suspend/resume and find common points of failure.
 * improve logging infrastructure of pm-suspend/pm-resume utils.
 * identify and tag frequency of suspend/resume failure in bugs.
Line 19: Line 26:
== Rationale ==

=== Userland control of suspend/resume/hibernate ===

pm-utils.
This effort is a continuation of the work that was done for Karmic cycle, it will not improve user experience for the case where proprietary/staging drivers are involved.
Line 26: Line 29:
=== Diagnosing and Fixing suspend/resume/hiberate === === Diagnosing and Fixing suspend/resume/ bugs ===
Line 28: Line 31:
Diagnosing and hence fixing broken suspend/resume/hibernate needs to easier. A Community based Wiki tutorial or troubleshooting guide (such as http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-debug.html) needs to be written as the de-facto reference page to help users to:

 * identify known hardware issues
 * step-by-step diagnose problems
 * gather hardware and kernel specific information that can help pin-point and fix problems

Also having a program/script that can test for known quirks and suggest pm-suspend workarounds would be helpful.

Debugging suspend/resume/hibernate issues can be notoriously difficult; being able do dump kernel messages early to a serial console is useful. However modern PCs do not have legacy serial port hardware, so providing a USB serial console driver in initramfs is required.

== Use Cases ==

 * A user cannot get their laptop to resume. They visit the trouble shooting guide which lists known hardware issues and pm-suspend workarounds that they can try.
 
 * A user cannot get their laptop to resume. The Wiki trouble shooting guide explains how to run a quirk checking program which can automatically suggest pm-suspend workarounds.

 * A user cannot resume their laptop because they have problems with specific buggy drivers. The Wiki tutorial explains how to turn on the/sys/power/pm_trace "resume-trace" debugging procedure for finding buggy drivers to gather sufficient information to pin-point the relevant broken driver. They can then submit a bug report against this driver.

 * A user cannot hibernate their laptop. The tutorial explains how to make the hibernation core run in a test mode and then run through the 5 different test modes: freezer, devices, platform, processors or core.
 
 * A user wants to attach an early kernel log message to a bug report. By plugging in a USB-serial dongle and enabling the USB serial console driver they can then capture the log on another PC using hardware which does no have a legacy serial port. The tutorial should explain how to enable the driver using kernel boot line options and how to capture the dmesg log on a 2nd PC over serial using tools such as minicom.

 * A user shuts the laptop lid and put it directly in a bag, possibly a tight fitting neoprene bag. The laptop may take 20 seconds or so to complete the suspend process, but it must always complete to avoid a flat battery and potential fire hazard.
 * identify known hardware that causes suspend/resume issues
 * document step-by-step instruction on how to diagnose problems
 * larger community/upstream involvement in fixing known suspend/resume issues
 * use existing testcases for suspend/resume testing for Lucid.
Line 53: Line 37:
 * Suspend/Resume problems reported on non-proprietary/staging drivers
  
Line 57: Line 42:
 * Measure suspend/Resume times for drivers
 * Report suspend/Resume failure frequency in bugs
 * Improve pm tools
 * Analyze data for existing bugs and identify common failure points.
Line 62: Line 51:
=== pm-utils Quirk Checking Scripts ===

The Quirk checking script http://people.freedesktop.org/~hughsient/quirk/quirk-checker.sh perhaps could be included into the disto to help users pin-point suspend/resume quirks.
 * Instrument the kernel to report suspend/resume times for drivers & potentially use a variation of boot chart to provide a visual representation.
 * Currently pm logs are truncated, this behavior needs to be changed, and logs needs to be rotated.
 * Report frequency of failure in the bugs by gathering data from the pm logs, this will help us prioritize bugs.
 * Review pm utils and make improvements and optimizations in the code.
 * We have several bugs filed for suspend/resume, we need to (programmaticaly) analyse this data and identify pain points, identify the components that contribute towards failure.
Line 73: Line 63:
== Unresolved issues ==

##This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.
 * OEM Team
  * What does the OEM Team need from the Kernel/Userspace?
  * What are typical and acceptable cycles? i.e. Suspend/Resume 500 times without failure
  * Other?
== BoF agenda and discussion ==

=== Userland control of suspend/resume/hibernate ===

 * pm-utils vs !DeviceKit-power - risk/benefit of choice
 * Continue to conduct checkbox based testing on suspend/resume tests at sprints or Linux Fests. Get users to report bugs.

Summary

Suspend/Resume works very well in Karmic. On our benchmark hardware, Dell Mini10V with SSD for storage, I am able to suspend in .8s and resume in 1.6s on an average. We would like to improve Suspend/Resume experience in Lucid. Major pain points in improving suspend/resume are , proprietary drivers, staging drivers etc.

The aim in karmic is

  • identify drivers that take long time to suspend/resume
  • analyze bugs on suspend/resume and find common points of failure.
  • improve logging infrastructure of pm-suspend/pm-resume utils.
  • identify and tag frequency of suspend/resume failure in bugs.

Release Note

This effort is a continuation of the work that was done for Karmic cycle, it will not improve user experience for the case where proprietary/staging drivers are involved.

Diagnosing and Fixing suspend/resume/ bugs

  • identify known hardware that causes suspend/resume issues
  • document step-by-step instruction on how to diagnose problems
  • larger community/upstream involvement in fixing known suspend/resume issues
  • use existing testcases for suspend/resume testing for Lucid.

Assumptions

  • Suspend/Resume problems reported on non-proprietary/staging drivers

Design

  • Measure suspend/Resume times for drivers
  • Report suspend/Resume failure frequency in bugs
  • Improve pm tools
  • Analyze data for existing bugs and identify common failure points.

Implementation

  • Instrument the kernel to report suspend/resume times for drivers & potentially use a variation of boot chart to provide a visual representation.

  • Currently pm logs are truncated, this behavior needs to be changed, and logs needs to be rotated.
  • Report frequency of failure in the bugs by gathering data from the pm logs, this will help us prioritize bugs.
  • Review pm utils and make improvements and optimizations in the code.
  • We have several bugs filed for suspend/resume, we need to (programmaticaly) analyse this data and identify pain points, identify the components that contribute towards failure.

Test/Demo Plan

  • Continue to conduct checkbox based testing on suspend/resume tests at sprints or Linux Fests. Get users to report bugs.

Diagnosing and Fixing suspend/resume/hiberate

References


CategorySpec

KernelTeam/Specs/KernelLucidSuspendResumeImprovements (last edited 2009-11-20 07:18:30 by 63)