Ubuntu QA:
BlogBrainstormPackage status
Log in
Ubuntu QA
The Ubuntu community has contributed 21549 ideas, 132424 comments, 2606791 votes
Idea sandbox Idea sandbox
Popular ideas Popular ideas
Ideas in development Ideas in development
Implemented ideas Implemented ideas
Idea #22854: Make the debian installation packages smaller.

Written by DrG the 7 Dec 09 at 08:17. Category: Usability. Related project: Nothing/Others. Status: New
Rationale
Ubuntu installation packages do not have good compression ratios.
Much better compression algorithms which are included in the OS are not used for packaging

20
votes
up equal down
Solution #1: Make the debian installation packages smaller.
Written by DrG the 7 Dec 09 at 08:17.
A smaller file size can be achieve with no additional software and not much difference in time.

Example : ( with nautilus and even with the weaker LZMA available by default in Karmic)
'gimp_2.6.1-1ubuntu3_i386.deb' is one of the deb packages of GIMP.
Size is 4.2 MB (4366488 bytes)

extract gimp_2.6.1-1ubuntu3_i386.deb
there will be three files in the resulting directory (3 items, totaling 4.2 MB)
1.control.tar.gz 8.1 KB (8272 bytes)
2.data.tar.gz 4.2 MB (4358024 bytes)
3.debian-binary 4 bytes (4 bytes)

Extract data.tar.gz there will be a directory named usr
Contents : 262 items, totalling 11.5 MB

Compress 'usr' with .tar.lzm
Size of resulting file 'usr.tar.lzma' is 2.7 MB (2800592 bytes)

delete contents otherthan control.tar.gz , debian-binary , and usr.tar.lzma
compress the folder
Resulting file 'gimp_2.6.1-1ubuntu3_i386.tar.lzma'
Size : 2.7 MB (2847218 bytes) ( difference in size is 1519270 bytes)

Although now the file is not directly installable , there is 34.79 % decrease in size . The contents remains the same.

There are other compression algorithms which can give a much better compression ratio .
53
votes
up equal down
Solution #2: Solution #1 Revised. .xz and lzma2 not lzma
Written by readmanr the 7 Dec 09 at 22:32.
xz is a file format and command line tool with syntax similar to that of gzip but uses LZMA2 instead of DEFLATE. As GNU Tar v1.22 has replaced the short option -J and reassigned it as a shortcut for --xz, there is no need to stay in the ice age by using lzma.
RPM in Fedora 12 has switched to using XZ compression, the XZ library is still in beta, but the file format has been finalized (instead of the not-finalized LZMA)

'gimp_2.6.1-1ubuntu3_i386.deb' 4.2M (4366488 bytes)
1.control.tar.gz (8272 bytes) 2.data.tar.gz (4358024 bytes) 3.debian-binary (4 bytes)

The the two .tar files compressed with .xz (.tar.xz, .txz) instead of .gz (.tar.gz, .tgz) the files are now...

1.control.tar.xz (7588 bytes) 2.data.tar.xz (2799000 bytes) 3.debian-binary (4 bytes)

Total size with .gz (4366300 bytes) 4.16M
Total size with .lzma (2847218 bytes) 2.71M
Total size with .xz (2806592 bytes) 2.67M
Total saving (.gz/.xz) of (1559708 bytes) 1.48M

This is an example of just one small file, the savings available by using a better compression are huge... in time, bandwidth, money, more space on the live CD for further apps.
-4
votes
up equal down
Solution #3: There are better compression algorithms.
Written by DrG the 8 Dec 09 at 07:08.
If you need both compression and time you had to compile an external package FreeArc (combines multiple algorithms including lzma )
http://freearc.org/download/0.60RC/FreeArc-0.60RC2-linux-i386.tar.bz2 (for i386 )

With this , compress 'usr' folder with tar. > usr.tar
Then on command line , from the directory
arc a -m9 usr.tar.arc usr.tar
>> size : 2.4 MB (2476722 bytes) in 14.9 seconds(lzma took 20 seconds with -9 switch to make it 2.7 MB (2807429 bytes))
or
arc a -m3 usr.tar.arc usr.tar
>> size : 2.7 MB (2807627 bytes) in 3.37 sec

There are algorithms which gives much better compression but takes considerable time
Example: ( with the same file above )

Unpack 'data.tar.gz' to directory 'usr'
compress 'usr' with -6 option in paq8l ( available in karmic repo ) . For maximum compression , the swith is -8 but that much memory is not available in my system
This will give 'usr.paq8l'
Delete 'data.tar.gz' and directory 'usr' and compress the parent folder to 'gimp_2.6.1-1ubuntu3_i386.2.tar.lzma'

size : 1.7 MB (1779197 bytes) (59.25 % reduction )

Now the bad side – The compression with paq8l took more than 30 min ( in Acer aspire 4720z )
The worse side – The decompression took around 30 min .
This time will be much less in higher end PCs

So not suitable for today's packages
# !!!!!!!!
Even with this much time taken , there can be a potential use of this paq8l – data transfer across slow Internet connections ( like 56k dial-up ) . Here the data received per second will be very small , this can be very efficiently handled by paq8l at runtime.( so no extra time will be taken )
# !!!!!!!

There are other 'paq' s which gives some what similar compression within much less time but they are not available in repo.

If interested , also look at -
1.http://www.squeezechart.com/record.html
2.http://www.maximumcompression.com/index.html
3.http://www.cs.fit.edu/~mmahoney/compression/

Propose your solution

Attachments
No attachments.


Duplicates


Comments
DaVince wrote on the 7 Dec 09 at 15:21
Not bad, but these packages should be optional/secondary in order to maintain compatibility with current/older versions of the package manager.

readmanr wrote on the 7 Dec 09 at 21:37
Going further, the problem is the files inside the .deb being gzipped. LZMA2 Compression should be used to produce .xz files instead of .gz files.
RPM in Fedora 12 has switched to using XZ compression, the XZ library is still in beta, but the file format has been finalized, supporting the use of .xz instead of .gz on ubuntu would be a huge way forward in saving space, time and bandwidth! Plus disc space / more apps of the live CD.

readmanr wrote on the 8 Dec 09 at 08:05
I think you were closer with #1 than #3 DrG, gz, bz2, lzma, and xz are all main formats.
With Gzip, its fast compression, fast decompression, with bz2 it offers slightly better compression but is slow both ways, with lzma/lzma2(xz) it is the similar to bz2 at compression but decompression speed is closer to gz than bz2. The whole reason bz2 isnt used as much as gz is due to its slow decompression, other compression formats that take 30minutes like the ones you suggested are just no good for using in .deb or updates. KBG is excellent but takes hours.

DrG (Idea reviewer) wrote on the 8 Dec 09 at 09:10
Just mentioned their existance

DrG (Idea reviewer) wrote on the 8 Dec 09 at 09:53
Also see http://brainstorm.ubuntu.com/idea/22869

dino wrote on the 8 Dec 09 at 16:35
What is about RAM and CPU usage of these Algorythems? Please include them in your benchmark.

DrG (Idea reviewer) wrote on the 8 Dec 09 at 16:55
dino - depends on your PC configurations.
For mine (aspire 4720z ) paq8l - 90% cpu and around 500mb ram with -6 switch. But may be still useful for the narrow band job , with faster switches . with lzma ( built in ) , 80 % cpu around 20 seconds with around 500 mb ram.

readmanr wrote on the 8 Dec 09 at 20:47
dino, depends on the configuration used to compress.
With XZ all default, it requires 192MB to compress, but only 18MB to decompress, which is in-line to even meet the Ubuntu absolute minimum system requirements of 32MB RAM as it is only the developer (compress) who requires more memory. I must point out at this point that LZMA/LZMA2 has been ported to the Amiga and works on those systems with such limited CPU and RAM available. Take a read on the internet of recent benchmarks done with Gz, Bz2, 7z, Xz etc.

fizyk wrote on the 8 Dec 09 at 22:20
Wouldn't those files become inconsistent with Debian?

DrG (Idea reviewer) wrote on the 9 Dec 09 at 05:15
@ fizyk
There will be no question of incompatibility.
There may be version controls for deb packages . Just ask to install another package , which includes the decompresser , in older system without compatibility to the new packages.
In extreme case make them something like '.ui' ( for ubuntu installer ) instead of '.deb' and another package, say 'libui' ( containing decompresser and such stuff ) which should be installed before installing '.ui' .


snadrus wrote on the 4 Mar 10 at 18:21
DrG, there's already a libUI.

Fizyk, Your concern over breaking Debian compatibility with the past is valid, but Ubuntu is practically "Debian unstable" on a tigher release schedule.

Anything useful that Ubuntu produces will make its way back to Debian since they're so similar already and both the same licenses.


Post your comment