PythonProfilingTool

Summary

Find tools to profile memory use in Python programs, document them, and if necessary, tweak them.

Release Note

This spec has no direct impact on end users.

Rationale

Relatively large parts of the software in an Ubuntu system are written in Python. The memory requirements of Ubuntu are growing. Tools to profile memory use in Python programs are needed. Since many of the tools that most use memory have a graphical user interface, the tools need to work with programs that use PyGTK.

Use Cases

  • Crabtree is a Python hacker, and wants to know why deskbar-applet takes up so much memory.

Design

  • Find profiling tools, for Python or generically.
  • Experiment with them.
  • Document results.
  • Suggest one or some for use by Ubuntu developers.
  • Make useful improvements to the suggested tools.

See also https://wiki.ubuntu.com/UDS-Intrepid/Report/Platform#head-36ce9220ebe4ef5486adff7e3cca20038ff85304

Overview

Memory is used in many ways on a Linux system. The kernel allocates memory by page, collected into areas. Pages can be filled with data loaded from files, and such pages may be read-only or read-write. Writeable pages can be "clean", i.e., identical to the data in the file. Pages may also be in RAM or in swap. Read-only pages may be shared between processes. Plus other complications. Because of all this, it is not enough to just look at how much memory is allocated to a process to determine its memory cost.

Memory profiling tools need to look at each area to see whether it is read-only or writeable, and if writeable, whether it is clean (same as on disk) or dirty (modified). A clean read-only page can be immediately freed by the kernel, whereas a dirty writeable page cannot. The latter page has a bigger memory cost. Indeed, it is not a bad idea to concentrate on minimizing the number of dirty pages in a process, when minimizing memory requirements. Clean pages can be freed and demand-paged back in as necessary, and shared with the disk block cache. (This is highly simplistic, but good enough for a first approximation, at least.)

Python memory use

Looking at the number of dirty pages used by a Python program (or rather, the Python interpreter while running a Python program) does not help much when reducing memory requirements. There needs to be tools specific to Python to profile how the memory is used: what objects exists, how much memory they use, how many there are, which part of the code created them, and so on.

Because Python manages memory and has its own garbage collector, the memory profiling tool should also be able to tell how well that works: if there is a lot of garbage in Python's memory heap, and the garbage collector is not called to free it, then things are bad.

System-level tools

  • top, htop: list processes according to CPU or memory usage, or other criteria
  • iostat (from sysstat package): I/O status (not directly related to memory usage)
  • pmap: report a process's memory use by area, with state of each area (clean/dirty/...)

Tools for C/C++

  • valgrind: can examine/report heap use of a process
    • there was talk of a memgrind tool, but can't find more information
  • objdump: find non-static-const constant data (particularly in libraries)

Tools for Python

Implementation

Use Guppy to see how well it works. Attempt to use it to see what deskbar-applet uses all its memory for. Document results.

If this works, good. If Guppy can be improved to make it more efficient, accurate, or useful, make a plan for implementing that, and attempt to implement that.

Test/Demo Plan

FIXME.


CategorySpec

PythonProfilingTool (last edited 2008-08-06 16:22:30 by localhost)