ubuntu-i686

Revision 5 as of 2006-07-29 21:37:30

Clear message

Summary

Make a seperate main/universe/multiverse/restricted repo for the i686 arch. This may involve dissolving the 386 repo, but doesn't have too. This shouldn't be that hard as most of the debian build tools are automated by now. The problem is getting another build server OR stealing time from another and then the storage space.

This would mean everything from the x-server to gtk+ to firefox to openoffice to mono to gnome would be optimized for your processor.

Rationale

  1. Gentoo. ArchLinux. Slackware. These distros are generally all faster then ubuntu. Dapper is the fastest ubuntu yet, but still isn't as quick as these distros. Gentoo and ArchLinux's clam to this speed in the fact that they are compiled for the architecture that is on your computer. Why not make Edgy even faster?

  2. You only need a Pentium 2 processor. Can we somehow check the ubuntu hardware database to see how many current ubuntu users need only the 386 arch? (or the 486, or 586 arch?) http://hwdb.ubuntu.com/

  3. Easy to do. Just setup a build server? I don't know exactly how this is done, but I'm sure there is a ubuntu developer who could get a 686 build server up and running very quickly.

Use cases

Scope

Design

Implementation

Code

Data preservation and migration

Outstanding issues

BoF agenda and discussion

  • a half-way house is to optimise for 686, but generate code that runs on 386. A lot of optimisations are done on the basis of cache size, etc., not just register usage.
  • the new repo need only have the migrated packages. Many packages, like python code are arch independent, or have very little code. the key packages which would benefit could move first. E.g. libc6-686 would become libc6 and you'd always get the best one for you. decompression and compression algorithms would benefit, as a lot of this goes on in the background.
  • Make a list of the key packages which would benefit, and generate *-686 versions for them. This already exists for libc6, mplayer, and a few others. Write a list of candidates and edit their source packages to use different compiler flags. Benchmark!
  • LunaTick : If projects that would benefit from optimisation used [http://liboil.freedesktop.org/wiki/ liboil] much greater speed increases would happen automagically. An approach like liboil also has the advantage that it optimises for your exact chip (MMX, SSE, SSE2, SSE3, Altivec etc.) and detects what you can support. If all the "hard work" of these applications was centralised in one library, only that library needs to be optimised for each platform.

JohnMoser: This spec is full of holes. For example:

  • Ubuntu is built using i486 instructions with i686 scheduling already.
  • Gentoo is not that much faster from the instruction set it uses; it's faster because half the crap isn't loaded. Binary distros have to build support for every little niché thing, like krb5 or LDAP or inetd or WMF images; while source ones typically don't have things like ipv6 support or half a dozen other things enabled by default. Applications and libraries hard-code the use of these based on ./configure switches, and then load slow because they have to load all those libraries.
  • SSE, MMX, and 3DNOW! only matter in EXTREMELY specialized cases. MMX is integer only and reuses the FPU registers, so you either can't use MMX and floating point math at the same time; or you get slow MMX performance because you have to save and restore the FPU registers every time you touch them. SSE is good when the same operation is going to be performed repeatedly on a number of pieces of data, because it pipelines the data through a single instruction; but I've heard it takes roughly 17 clock cycles to load an SSE register with data, so if you use this for general floating point math you'll get a massive slowdown.

JohnMoser: While I would like to move up to i586 or i686 instructions, I can find no compelling reason. Below are benchmarks from nbench, with the "CFLAGS =" line in the Makefile using "CFLAGS +=" to get extra flags from the environment. The only substantial gains seem to be in Floating Point Emulation, which is not useful for i686 (this causes a huge integer index differential to be calculated, however).

bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ CFLAGS="-march=i486 -mtune=i686" make
bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ sudo nice -n -18 time ./nbench 

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           824.6  :      21.15  :       6.95
STRING SORT         :          149.66  :      66.87  :      10.35
BITFIELD            :      3.7346e+08  :      64.06  :      13.38
FP EMULATION        :          68.997  :      33.11  :       7.64
FOURIER             :           17600  :      20.02  :      11.24
ASSIGNMENT          :          19.681  :      74.89  :      19.42
IDEA                :          2840.6  :      43.45  :      12.90
HUFFMAN             :          1295.9  :      35.93  :      11.47
NEURAL NET          :          28.177  :      45.27  :      19.04
LU DECOMPOSITION    :           955.2  :      49.48  :      35.73
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 44.594
FLOATING-POINT INDEX: 35.524
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : AuthenticAMD AMD Athlon(tm) 64 Processor 2800+ 1800MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.17-5-686
C compiler          : gcc version 4.1.2 20060715 (prerelease) (Ubuntu 4.1.1-9ubuntu1)
libc                : libc-2.4.so
MEMORY INDEX        : 13.908
INTEGER INDEX       : 9.414
FLOATING-POINT INDEX: 19.703
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
274.86user 0.03system 4:39.05elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+996minor)pagefaults 0swaps

bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ CFLAGS="-march=i686 -mtune=i686" make clean nbench
bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ sudo nice -n -18 time ./nbench 

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          821.96  :      21.08  :       6.92
STRING SORT         :           150.2  :      67.11  :      10.39
BITFIELD            :      3.7494e+08  :      64.32  :      13.43
FP EMULATION        :           86.08  :      41.31  :       9.53
FOURIER             :           17600  :      20.02  :      11.24
ASSIGNMENT          :          19.761  :      75.19  :      19.50
IDEA                :          2830.3  :      43.29  :      12.85
HUFFMAN             :          1329.9  :      36.88  :      11.78
NEURAL NET          :          28.287  :      45.44  :      19.11
LU DECOMPOSITION    :           961.6  :      49.82  :      35.97
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 46.228
FLOATING-POINT INDEX: 35.650
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : AuthenticAMD AMD Athlon(tm) 64 Processor 2800+ 1800MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.17-5-686
C compiler          : gcc version 4.1.2 20060715 (prerelease) (Ubuntu 4.1.1-9ubuntu1)
libc                : libc-2.4.so
MEMORY INDEX        : 13.962
INTEGER INDEX       : 9.997
FLOATING-POINT INDEX: 19.773
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
307.04user 0.08system 5:11.25elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+529minor)pagefaults 0swaps


CategorySpec