ubuntu-i686

Summary

Make a seperate main/universe/multiverse/restricted repo for the i686 arch. This may involve dissolving the 386 repo, but doesn't have too. This shouldn't be that hard as most of the debian build tools are automated by now. The problem is getting another build server OR stealing time from another and then the storage space.

This would mean everything from the x-server to gtk+ to firefox to openoffice to mono to gnome would be optimized for your processor.

Rationale

  1. Gentoo. ArchLinux. Slackware. These distros are generally all faster then ubuntu. Dapper is the fastest ubuntu yet, but still isn't as quick as these distros. Gentoo and ArchLinux's clam to this speed in the fact that they are compiled for the architecture that is on your computer. Why not make Edgy even faster?

  2. You only need a Pentium 2 processor. Can we somehow check the ubuntu hardware database to see how many current ubuntu users need only the 386 arch? (or the 486, or 586 arch?) http://hwdb.ubuntu.com/

  3. Easy to do. Just setup a build server? I don't know exactly how this is done, but I'm sure there is a ubuntu developer who could get a 686 build server up and running very quickly.

Use cases

Scope

Design

Implementation

Code

Data preservation and migration

Outstanding issues

BoF agenda and discussion

  • a half-way house is to optimise for 686, but generate code that runs on 386. A lot of optimisations are done on the basis of cache size, etc., not just register usage.
  • the new repo need only have the migrated packages. Many packages, like python code are arch independent, or have very little code. the key packages which would benefit could move first. E.g. libc6-686 would become libc6 and you'd always get the best one for you. decompression and compression algorithms would benefit, as a lot of this goes on in the background.
  • Make a list of the key packages which would benefit, and generate *-686 versions for them. This already exists for libc6, mplayer, and a few others. Write a list of candidates and edit their source packages to use different compiler flags. Benchmark!
  • LunaTick : If projects that would benefit from optimisation used liboil much greater speed increases would happen automagically. An approach like liboil also has the advantage that it optimises for your exact chip (MMX, SSE, SSE2, SSE3, Altivec etc.) and detects what you can support. If all the "hard work" of these applications was centralised in one library, only that library needs to be optimised for each platform.

JohnMoser: This spec is full of holes. For example:

  • Ubuntu is built using i486 instructions with i686 scheduling already.
  • Gentoo is not that much faster from the instruction set it uses; it's faster because half the crap isn't loaded. Binary distros have to build support for every little niché thing, like krb5 or LDAP or inetd or WMF images; while source ones typically don't have things like ipv6 support or half a dozen other things enabled by default. Applications and libraries hard-code the use of these based on ./configure switches, and then load slow because they have to load all those libraries.
  • SSE, MMX, and 3DNOW! only matter in EXTREMELY specialized cases. MMX is integer only and reuses the FPU registers, so you either can't use MMX and floating point math at the same time; or you get slow MMX performance because you have to save and restore the FPU registers every time you touch them. SSE is good when the same operation is going to be performed repeatedly on a number of pieces of data, because it pipelines the data through a single instruction; but I've heard it takes roughly 17 clock cycles to load an SSE register with data, so if you use this for general floating point math you'll get a massive slowdown.

JohnMoser: While I would like to move up to i586 or i686 instructions, I can find no compelling reason. Below are benchmarks from nbench, with the "CFLAGS =" line in the Makefile using "CFLAGS +=" to get extra flags from the environment. The only substantial gains seem to be in Floating Point Emulation, which is not useful for i686 (this causes a huge integer index differential to be calculated, however).

bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ CFLAGS="-march=i486 -mtune=i686" make
bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ sudo nice -n -18 time ./nbench 

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           824.6  :      21.15  :       6.95
STRING SORT         :          149.66  :      66.87  :      10.35
BITFIELD            :      3.7346e+08  :      64.06  :      13.38
FP EMULATION        :          68.997  :      33.11  :       7.64
FOURIER             :           17600  :      20.02  :      11.24
ASSIGNMENT          :          19.681  :      74.89  :      19.42
IDEA                :          2840.6  :      43.45  :      12.90
HUFFMAN             :          1295.9  :      35.93  :      11.47
NEURAL NET          :          28.177  :      45.27  :      19.04
LU DECOMPOSITION    :           955.2  :      49.48  :      35.73
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 44.594
FLOATING-POINT INDEX: 35.524
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : AuthenticAMD AMD Athlon(tm) 64 Processor 2800+ 1800MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.17-5-686
C compiler          : gcc version 4.1.2 20060715 (prerelease) (Ubuntu 4.1.1-9ubuntu1)
libc                : libc-2.4.so
MEMORY INDEX        : 13.908
INTEGER INDEX       : 9.414
FLOATING-POINT INDEX: 19.703
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
274.86user 0.03system 4:39.05elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+996minor)pagefaults 0swaps

bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ CFLAGS="-march=i686 -mtune=i686" make clean nbench
bluefox@icebox:~/programming/bench/nbench-byte-2.2.2$ sudo nice -n -18 time ./nbench 

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          821.96  :      21.08  :       6.92
STRING SORT         :           150.2  :      67.11  :      10.39
BITFIELD            :      3.7494e+08  :      64.32  :      13.43
FP EMULATION        :           86.08  :      41.31  :       9.53
FOURIER             :           17600  :      20.02  :      11.24
ASSIGNMENT          :          19.761  :      75.19  :      19.50
IDEA                :          2830.3  :      43.29  :      12.85
HUFFMAN             :          1329.9  :      36.88  :      11.78
NEURAL NET          :          28.287  :      45.44  :      19.11
LU DECOMPOSITION    :           961.6  :      49.82  :      35.97
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 46.228
FLOATING-POINT INDEX: 35.650
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : AuthenticAMD AMD Athlon(tm) 64 Processor 2800+ 1800MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.17-5-686
C compiler          : gcc version 4.1.2 20060715 (prerelease) (Ubuntu 4.1.1-9ubuntu1)
libc                : libc-2.4.so
MEMORY INDEX        : 13.962
INTEGER INDEX       : 9.997
FLOATING-POINT INDEX: 19.773
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
307.04user 0.08system 5:11.25elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+529minor)pagefaults 0swaps

JohnMoser: Actually, thinking on this, what about glibc? I mean seriously the dynamic linker needs to do a lot of relocations (SUB, ADD, CMP) and symbol name comparisons (CMP, JNE), as well as hashing (ADD, SUB, SHL, SHR, MOD, MUL, DIV); while the most common standard functions like strcpy() and malloc() are doing the same or using ancient i286 instructions like MOVS. None of this stuff really does anything complex save for CMPXCHG, which is an i486 instruction. Is there any advantage to using the i686 instruction set?

PeterVanderKlippe: You seem to be much more educated in this area, so you would probably know better then me. It is completely possible that ubuntu is already compiled with all the possible optimizations. I really have no idea what currently exists. I do know that arch Linux, and gentoo have a reputation of being very optimized and very quick. The latest gnome desktop in dapper is extremely streamlined and usually only consumes 100-120 MB of ram, but I've read that arch's gnome only consumes 80-90 MB of ram, and just feels faster. This spec was just my way of trying to get Ubuntu optimized better. If the whole "compile for 686" is really the wrong way to do this, then this spec is depreciated or succeeded, or whatever. I just know a reputation of being speedy and using system resources efficiently would be extremely good for Ubuntu. That is my intent, but my method/implementation may be misguided. I'm no expert.

JohnMoser: Less "educated" more "I have a bunch of random knowledge from somewhere," but eh. Have you ever RUN Gentoo? I had a long CFLAGS line with -Os or -O3 or whatever (I've used both) and things like -ffast-math (Gentoo disables it where it breaks things-- usually) and -funroll-loops (actually makes things bigger and slower more often than not..), and a USE line that had just about everything. It was pretty slow. Backing off to CFLAGS="-O2 -march=k7" I got a good, speedy distro; but it still used more RAM than other things. It wasn't slow at least Smile :) I don't know about optimizing runtime code (the compiler does as good a job as it's going to do safely); but start-up times can be optimized, and I know a few people using portage overlays to do this stuff on Gentoo. Perhaps you should try using ArchLinux and/or Gentoo, and find out what else they and Ubuntu are doing that's different besides building for i686?

SimonStrandman: Are you sure about the default build settings? I mean that ubuntu is built with a i486 instruction set and i686 scheduling? Now I'm not an expert on the debian build tools but running debuild doesn't show the usage of any architechture flags. And firefox's about:buildconfig shows that it has been built with "-pipe -w -O2 -g -fno-strict-aliasing". Additionally running touch dummy.c;gcc -v -Q dummy.c shows that gcc doesn't default to any architecture flags either. So it would seem that ubuntu is built for pure i386, both scheduling and instructions set which is of course very suboptimal. Please prove me wrong!

PeterVanderKlippe: As you can see this spec was created almost a year ago. It has little relevance to the current ubuntu anymore. (and it really didn't have much relevance at the time) As this spec was never picked up by officially and there isn't much support behind it, I think it should be just be marked as outdated and left alone.


CategorySpec

ubuntu-i686 (last edited 2008-08-06 16:23:35 by localhost)