power-thermal-optimizations

Differences between revisions 12 and 14 (spanning 2 versions)
Revision 12 as of 2007-07-17 11:56:30
Size: 18037
Editor: a81-197-135-210
Comment:
Revision 14 as of 2008-08-06 16:39:18
Size: 8692
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
''' S3 Resume Optimisation'''
A patch where the device initialization on resume executed in a separate thread, giving user/app a faster control on s3 resume

''' Power Event Notification'''
Linux kernel does not have a way to notify the power events (suspend to ram, suspend to disk and resume from suspend) to the user space applications. Patch to give a notification to the user-space applications when suspend to ram, suspend to disk or resume from suspend happens.
Line 22: Line 17:
''' S3 Resume Optimisation'''
The current resume code at power/main.c is modified to do device initialisation in a seperate kernel thread. The IO schedulars before dispatching the request to the device, do a check to see if the resume thread is exited.

''' Power Event Notification'''
On system resume and suspend handling function kobject_uevent system call is used to post the message to the user space so that the user space applications can recive the event.
Line 31: Line 20:

''' S3 Resume Optimisation'''
For Mid devices,for better power saving, the system needs to go to S3 as frequently as possible and resume from S3 as quick as possible. However the current resume time from S3 is close to 5 sec.Bulk of this is because of device initialisation. This patch parellaise the device init on resume in a seperate kernel thread. To avoid any race condition it does a check for device readiness in the io-schedular. The resume time with this patch shows 70% improvement on resume time

''' Power Event Notification'''
User level applications which has dependency on the information of when system going to S3 (initiated by some other app) needs a clean notification mechanism from the kernel whenever the system goes to s3 and resuming from it. This patch provides that capability to the kernel.
Line 51: Line 34:
attachment:thermal-power-opt.gif {{attachment:thermal-power-opt.gif}}
Line 106: Line 89:
== S3 resume Optimisation ==

Here is a simple patch for optimising the S3 resume. With this patch the resume time is 0.85.
Given the fact that device initialisation on the resume takes almost 70% of time,
By executing the whole "device_resume()" function on a seperate kernel thread,
the resume gets completed( ie. the user can precieve) by ~0.85 sec.
To avoid any possible race condition while processing the IO request and to make sure all the
io request are queued till the device resume thread exits, the IO schedulars (patched cfq and as)
checks a for system_resume flag, which is set when the device resume thread starts,
if the flag is set, it doesnt put the request in the dispatch queue. Once the flag is cleared i.e when
the device resume thread is complete, the IO-schedular behave as in normal situation. I did some
validation of this patch on a NAPA board ( Calistoga chipset with Dothan Processor with and Without SMP)
locally here and havent noticed any issue so far.

== Power event Notification ==
Here is a simple patch for power event notification to user-space
applications. Basically, what it does is notify the user-space
applications that the system is going to a low power state
(Suspend-to-RAM and Suspend-to-Disk) and resume from that state. This is
useful for the user-space applications to do some significant action
when the system goes to the low power state (like saving an unsaved
file). The user-space objects can form a netlink socket and listen to
these events. It is done through a kobject-netlink socket. For this I
have used the kobject_uevent system call, posting the notification to
user space with standby, hibernate and resume in the action parameter of
the kobject_uevent call (mapped to KOBJ_S3, KOBJ_S4 and KOBJ_RESUME
enums).
Line 145: Line 100:
== S3 Resume Patch ( againt 2.6.21-rc7) ==
{{{
diff -aur linux-2.6.21-rc7-vanilla/block/as-iosched.c linux-2.6.21-rc7/block/as-iosched.c
--- linux-2.6.21-rc7-vanilla/block/as-iosched.c 2007-04-16 05:20:57.000000000 +0530
+++ linux-2.6.21-rc7/block/as-iosched.c 2007-07-04 14:00:39.000000000 +0530
@@ -903,6 +903,14 @@
   return 0;

  rq = rq_entry_fifo(ad->fifo_list[adir].next);
+ /*
+ * Check here for the System resume flag to be cleared, if flag is
+ * still set the resume thread hasnt completed yet, and hence dont
+ * takeout any new request from the FIFO
+ */
+ extern int system_resuming;
+ if (system_resuming != 0)
+ return 0;

  return time_after(jiffies, rq_fifo_time(rq));
 }
diff -aur linux-2.6.21-rc7-vanilla/block/cfq-iosched.c linux-2.6.21-rc7/block/cfq-iosched.c
--- linux-2.6.21-rc7-vanilla/block/cfq-iosched.c 2007-04-16 05:20:57.000000000 +0530
+++ linux-2.6.21-rc7/block/cfq-iosched.c 2007-07-04 14:01:05.000000000 +0530
@@ -880,6 +880,7 @@
  struct cfq_data *cfqd = cfqq->cfqd;
  struct request *rq;
  int fifo;
+ extern int system_resuming;

  if (cfq_cfqq_fifo_expire(cfqq))
   return NULL;
@@ -888,7 +889,13 @@

  if (list_empty(&cfqq->fifo))
   return NULL;
-
+ /*
+ * Check here for the System resume flag to be cleared, if flag is s
+ * still set the resume thread hasnt completed yet, and hence dont
+ * move any request from the read/write to dispatch queue
+ */
+ if(system_resuming != 0)
+ return NULL;
  fifo = cfq_cfqq_class_sync(cfqq);
  rq = rq_entry_fifo(cfqq->fifo.next);

diff -aur linux-2.6.21-rc7-vanilla/kernel/power/main.c linux-2.6.21-rc7/kernel/power/main.c
--- linux-2.6.21-rc7-vanilla/kernel/power/main.c 2007-07-04 13:47:02.000000000 +0530
+++ linux-2.6.21-rc7/kernel/power/main.c 2007-07-04 13:59:30.000000000 +0530
@@ -23,7 +23,7 @@
 #include <linux/vmstat.h>

 #include "power.h"
-
+int system_resuming;
 /*This is just an arbitrary number */
 #define FREE_PAGE_NUMBER (100)

@@ -129,7 +129,16 @@
  local_irq_restore(flags);
  return error;
 }
-
+static int dev_resume_proc(void * data)
+{
+ /* Set the global resume flag, this will be checked by the IO_schedular
+ * before dispatching the IO request
+ */
+ system_resuming =1;
+ device_resume();
+ system_resuming = 0;
+ return (0);
+}

 /**
  * suspend_finish - Do final work before exiting suspend sequence.
@@ -141,9 +150,15 @@

 static void suspend_finish(suspend_state_t state)
 {
+ int thread;
  enable_nonboot_cpus();
  pm_finish(state);
- device_resume();
+ system_resuming = 0;
+ thread = kernel_thread(dev_resume_proc,NULL,CLONE_KERNEL);
+ if (thread < 0){
+ printk ("Suspend resume Cannot create Kernel_thread\n");
+ device_resume();
+ }
  resume_console();
  thaw_processes();
  pm_restore_console();
}}}
== Power event notification patch (Against 2.6.21.rc7) ==
{{{
diff -aruN kernel-mid/include/linux/kobject.h linux-pwr-evnt-notfn/include/linux/kobject.h
--- kernel-mid/include/linux/kobject.h 2007-04-16 05:20:57.000000000 +0530
+++ linux-pwr-evnt-notfn/include/linux/kobject.h 2007-07-05 15:17:11.000000000 +0530
@@ -48,6 +48,9 @@
  KOBJ_OFFLINE = (__force kobject_action_t) 0x06, /* device offline */
  KOBJ_ONLINE = (__force kobject_action_t) 0x07, /* device online */
  KOBJ_MOVE = (__force kobject_action_t) 0x08, /* device move */
+ KOBJ_S3 = (__force kobject_action_t) 0x09, /* system suspend to RAM */
+ KOBJ_S4 = (__force kobject_action_t) 0x0A, /* system suspend to disk */
+ KOBJ_RESUME = (__force kobject_action_t) 0x0B, /* system resume */
 };
 
 struct kobject {
diff -aruN kernel-mid/kernel/power/disk.c linux-pwr-evnt-notfn/kernel/power/disk.c
--- kernel-mid/kernel/power/disk.c 2007-04-16 05:20:57.000000000 +0530
+++ linux-pwr-evnt-notfn/kernel/power/disk.c 2007-07-05 15:17:04.000000000 +0530
@@ -184,6 +184,7 @@
  resume_console();
  Thaw:
  unprepare_processes();
+ kobject_uevent(&power_subsys.kset.kobj, KOBJ_RESUME);
  return error;
 }
 
diff -aruN kernel-mid/kernel/power/main.c linux-pwr-evnt-notfn/kernel/power/main.c
--- kernel-mid/kernel/power/main.c 2007-04-16 05:20:57.000000000 +0530
+++ linux-pwr-evnt-notfn/kernel/power/main.c 2007-07-05 15:17:04.000000000 +0530
@@ -141,6 +141,7 @@
 
 static void suspend_finish(suspend_state_t state)
 {
+ kobject_uevent(&power_subsys.kset.kobj, KOBJ_RESUME);
  enable_nonboot_cpus();
  pm_finish(state);
  device_resume();
@@ -191,6 +192,11 @@
 {
  int error;
 
+ if (state == PM_SUSPEND_MEM) /* if suspend to RAM */
+ kobject_uevent(&power_subsys.kset.kobj, KOBJ_S3);
+ if (state == PM_SUSPEND_DISK) /* if suspend to disk */
+ kobject_uevent(&power_subsys.kset.kobj, KOBJ_S4);
+
  if (!valid_state(state))
   return -ENODEV;
  if (!mutex_trylock(&pm_mutex))
@@ -335,8 +341,10 @@
 static int __init pm_init(void)
 {
  int error = subsystem_register(&power_subsys);
- if (!error)
+ if (!error) {
+ kset_set_kset_s(&power_subsys, power_subsys);
   error = sysfs_create_group(&power_subsys.kset.kobj,&attr_group);
+ }
  return error;
 }
 
diff -aruN kernel-mid/lib/kobject_uevent.c linux-pwr-evnt-notfn/lib/kobject_uevent.c
--- kernel-mid/lib/kobject_uevent.c 2007-04-16 05:20:57.000000000 +0530
+++ linux-pwr-evnt-notfn/lib/kobject_uevent.c 2007-07-05 15:15:14.000000000 +0530
@@ -52,6 +52,12 @@
   return "online";
  case KOBJ_MOVE:
   return "move";
+ case KOBJ_S3:
+ return "standby";
+ case KOBJ_S4:
+ return "hibernate";
+ case KOBJ_RESUME:
+ return "resume";
  default:
   return NULL;
  }

}}}

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

Thermal extension The platform thermal solution depends on the kernel framework for controlling the device performing state and monitor thermal sensor for the platform. The kernel thermal monitoring and controlling mechanism is spread across acpi thermal driver and non acpi thermal sensor driver, and the thermal algorithm are embedded in the kernel driver. The proposed patch is to extend the thermal driver and unify various thermal sensing/controlling property through sysfs interface so that platform level thermal related decision can be made at user space.

Release Note

Thermal extension The current thermal zone driver is modified to expose thermal properties of platform through Sysfs. A new thermal Sysfs driver is introduced which will export two interface for the platform specific sensor driver and component throttle driver. The cpu thermal driver will work as it is, but will interface with the thermal Sysfs driver.

Rationale

Thermal extension Linux notebooks today use a combination of ACPI and native-device thermal control. System uses ACPI’s CRT/HOT trip point for critical system shutdown, since on a handheld, shutdown and hibernate to disk (if one even exists) are likely to be synonymous. Active trip points are of no use on systems which have no fans. That leaves the single PSV trip point. ACPI 2.0 can associate (only) a processor throttling device with a trip point. But the processor isn’t expected to always be the dominant contributor to thermal footprint on handhelds like it often is on notebooks. ACPI 2.0 includes the _TZD method to associate devices with thermal zones. However, ACPI doesn’t say anything about how to throttle non-processor devices—so that must be handled by native device drivers.

Use Cases

Assumptions

Design

Thermal Extension

Thermal monitoring will be done using inexpensive thermal sensors—polled by a low-power EC.

  • Thermal management policy decisions will be made from user space, as the user has a comprehensive view of the platform.
  • The kernel provides only the mechanism to deliver thermal events to user space, and the mechanism for user space to communicate its throttling decisions to native device drivers.

thermal-power-opt.gif Figure 1

Figure 1 shows the thermal control software stack. The thermal management policy control application sits on top. It receives netlink messages from the kernel thermal zone driver. It then implements device-specific thermal throttling via sysfs. Native device drivers supply the throttling controls in sysfs and implement device-specific throttling functions.

Thermal zone module

The thermal zone module has two components — a thermal zone sysfs driver and thermal zone sensor driver.

The thermal zone sysfs driver is platform-independent, and handles all the sysfs interaction. The thermal zone sensor driver is platform-dependent. It works closely with the platform BIOS and sensor driver, and has knowledge of sensor information in the platform.

Thermal zone sysfs driver

The thermal sysfs driver exports two interfaces (thermal_control_register() and thermal_control_deregister()) to component drivers, which the component drivers can call to register their control capability to the thermal zone sysfs driver. The thermal sysfs drier also exports two interfaces—

* thermal_sensor_register() * thermal_sensor_deregister()

to the platform-specific sensor drivers, where the sensor drivers can use this interface to register their sensor capability. This driver is responsible for all thermal Sysfs entries. It interacts with all the platform specific thermal sensor drivers and component drivers to populate the sysfs entries. The thermal zone driver also provides a notification-of-temperature service to a component driver. The thermal zone sensor driver as part of registration exposes its sensing and thermal zone capability.

Thermal Zone sensor driver

The thermal zone sensor driver provides all the platform-specific sensor information to the thermal sysfs driver. It is platform-specific in that it has prior information about the sensors present in the platform. The thermal zone driver directly maps the ACPI 2.0 thermal zone definition. The thermal zone sensor driver also handles the interrupt notification from the sensor trips and delivers it to user space through netlink socket. Component Throttle driver All the component drivers participating in the given thermal zone can register with the thermal driver, each providing the set of thermal ops it can support. The thermal driver will redirect all the control requests to the appropriate component drivers when the user programs the throttling level. Its is up to the component driver to implement the thermal control. For example, a component driver associated with DRAM would slow down the DRAM clock on throttling requests.

Thermal Zone Sysfs Property

Table 1 shows the directory structure exposing each thermal zone sysfs property to user space. The intent is that any combination of ACPI and native thermal zones may exist on a platform, but the generic sysfs interface looks the same for all of them. Thus, the syntax of the files borrows heavily from the Linux hwmon subsystem.

Each thermal zone provides its current temperature and an indicator that can be used by user-space to see if the current temperature has changed since the last read. If a critical trip point is present, its value is indicated here, as well as an alarm indicator showing whether it has fired. If a passive trip point is present, its value is indicated here, as well as an alarm indicator showing whether it has fired. There are symbolic links to the device nodes of the devices associated with the thermal zone. Those devices will export their throttling controls under their device nodes.

Throttling Sysfs Properties

Devices that support throttling will have two additional properties associated with the device nodes: throttling and throttling_max. A value of 0 means maximum performance, though no throttling. A value of throttling_ max means maximum power savings in the deepest throttling state available before device state is lost.

Events will be passed from the kernel to userspace using the Linux netlink facility. Interrupts from the sensor or EC are delivered to user-space through a netlink socket.

sysfs

ACPI

Description

R/W

temp1_input

_TMP

Current temerature

RO

temp1_alarm

Temperature change occurred

RW

temp1_crit

_CRT

Crtitical alarm temperature

RO

temp1_crit_alarm

Crtical alarm occurred

RW

temp1_passive

_PSV

Passive alarm termperature

RO

temp1_passive_alarm

Passive alarm occurred

RW

<device_name1>

Link to device 1 associated with zone

RO

<device_name2>

Link to device 2 associated with zone

RO

...

...

RO

Table 1

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during CD testing, and to show off after release.

This need not be added or completed until the specification is nearing beta.

Outstanding Issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

MobileAndEmbedded/power-thermal-optimizations (last edited 2008-08-06 16:39:18 by localhost)