DynamicAudioRateSpec

Revision 5 as of 2006-12-06 07:38:11

Clear message

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

  • Launchpad entry: none yet

  • Packages affected:

Summary

Currently esound runs at a fixed sample rate (44100 samples per second). For both quality and performance reasons, esound (or whatever other sound server Ubuntu might migrate to) should dynamically change its internal sample rate based on the rates of the connected clients and the actual capabilities of the audio hardware.

Rationale

Esound runs at a fixed sample rate of 44100. However, audio hardware that supports at least up to 48000 is extremely common, and at least one common desktop use case is natively at 48000, DVD playback. In this case, even if only one client is connected, the application (say Totem) must downsample the from 48000 to 44100. Downsampling is fairly computationally intensive and causes quality loss. This extra CPU time raises the minimum computing power required to play a DVD without dropping any frames, and causes quicker battery drain on portables.

High definition audio is starting to become more common. A desktop user should not be required to choose between utilizing a sample rate higher than 44100 and having software mixing available. A sound server that can provide dynamic sample rates is also a must for professional audio applications if they are to coexist with other desktop sound events (e.g., Gaim). (However, this says nothing about another critical pro-audio issue: latency.)

I also expect there will be a trend toward sending only digital audio to each speaker, devices which will have no hardware mixing capabilities, thus raising the importance of software mixing. I expect this because of the fact that for a given quality, low-pass and high-pass filters (for example) are now cheaper to make as digital processors than analog circuits. It also allows the DSP to be tuned to the speakers and cabinets in sophisticated ways, and in general for the quality of the D-A conversion hardware to be matched to the quality of the speakers. The M-Audio EX66 is a good example of this. I expect this technology to trickle down quickly because it will be more cost effective for a given quality. Granted, this paragraph just reveal my true nature as a closet audiophile. Wink ;)

Use cases

  1. No client connected, new client connects. Client should connect at min(client_native_rate, max_hardware_rate), sever should run at same.

  2. At least one client connected, new client connects. Client should connect at min(client_native_rate, max_hardware_rate). Sever should now run at max(connected_client_rates).

  3. Client disconnects, at least one remaining client connected. Sever should now run at max(connected_client_rates).

Or in short, the client always connects at min(client_native_rate, max_hardware_rate), and the sever always runs at max(connected_client_rates).

Scope

Design

I understand that Ubuntu is considering switching to [http://pulseaudio.org/ Pulse Audio] for its sound sever. I don't know if Pulse Audio already provides this capability, so depending upon whether the switch is made, this spec might be solved by the switch alone.

I'm uncertain whether common audio hardware can switch the sample rate midstream without creating any audible gap, so in use cases (2) and (3), the server may not be able to renegotiate the rate with the hardware if max(connected_client_rates) has changed. Another issue is whether the ALSA driver can accomplish such a renegotiation without any artifact.

I'm thinking the process of the client connecting should look something like this:

  1. Client requests a connection.
  2. Server responds with max_hardware_rate.

  3. Client decides what rate it wants, requests connection at this rate.
  4. If requested_rate <= max_hardware_rate, server accepts connection.

There are specific reasons why I think the sever should not hide max_hardware_rate, that the client should decide what rate to connect at (instead of the client always connecting at client_native_rate and the sever doing everything opaquely). For illustration, there are some important cases:

  • Wavelet-based codecs that can simply render at a different sample rate, for both improved quality and efficiency.
  • Streaming audio that can deliver different streams based on the desired sample rate.

Granted, this might not be utilized by any current codecs or streaming systems, but I believe it is an important part of future proofing the design.

It is the client's responsibility to down-sample to max_hardware_rate if needed. It is the sever's responsibility to up-sample each client stream to max(connected_client_rates) if needed. If required for a given stream, resampling will be done either on the client side or on the server side, but never on both.

Implementation

Code

Data preservation and migration

Unresolved issues

BoF agenda and discussion


CategorySpec CategorySpec