> < ^ Date: Wed, 29 Mar 2000 09:20:56 +0200 (CEST)
< ^ From: Martin Schoenert <martin.schoenert@math.rwth-aachen.de >
^ Subject: GAP distribution format

Lately there has been a lot of discussion about the GAP distribution
format. I once wrote a rationale for the choice of the Zoo archive
format, which may be relevant.

When I selected the Zoo format for the GAP distribution I had the
following goals.

A single format for all platforms. I.e., I did not want to build a
compressed-tar archive for Unix, a Zip archive for Windows, a Stuffit
archive for the Macintosh, etc.

A popular format. I.e., I did not want to develop yet another format
(actually those who know me, know that this is not true ;-)

A good compression. I.e., the archives shall be small. This ruled out the
then popular compressed-tar archives, which were roughly 30% larger than
the alternatives.

The decompression should not use additional temporary hard disk space.
I.e., if the archive is <x>MB and the uncompressed contents are <y>MB,
then <x>+<y>MB hard disk space should suffice. This is e.g. not true
for Winzip extracting from a gzipped-tar archive, because Winzip will
first ungzip the tar-archive, and then extract from the archive.

A readily available unpacker for each platform. This meant at the time
mostly various UNIX flavours, VMS, Atari TOS, DOS (in order of importance).

The ability to store meta information for the files, especially whether a
file is text or binary (or MacBinary, which came later though). This is
important so that files can be converted to local format.

Let me now evaluate the formats that were available back then.

Zip archives provide a common format for all platforms. The compression is
as good as it got back than. The decompression can be done without additional
hard disk space. There were readily available unpackers for DOS and TOS,
however back then there was no unpacker easily available for UNIX and VMS
systems (the Info-Zip project began just then). It is possible to store meta
information in the archive comment. Zip was the most popular format under
DOS, but only by a very small margin over Zoo (this changed with Winzip about
two years later). One of the big disadvantages of the Zip archives is the
central directory at the end of the archive, which means that an incomplete
archive is basically useless, and incomplete ftp transfers were a fact of
life back then.

Zoo archives provide a common format for all platforms. The compression is
almost as good as Zip (a few percents less). The decompression can be done
without additional hard disk space. There were readily available unpackers
for DOS and TOS, the source for the 'zoo' archiver could in theory be
compiled under UNIX and VMS, though it always required some porting (fixing
the location of include files, etc.). It is possible to store meta
information in member comments. Zoo was the most popular format under Atari
TOS (and the Atari ST was important back then, because it was what I had at
home) and the second most popular format under DOS.

Gzipped-tar archives (which was actually not available back then, but became
very popular for UNIX later) provide a common format for most of the
platforms. The compression is better than Zip. The unpacker usually consists
of two separate progams: gzip and tar. Under UNIX this is not a problem,
because those two programs communicate through a pipe. But under DOS and TOS
you first run gzip and then tar, and this requires additional hard disk
space. Unpackers were available for UNIX, TOS and DOS, but not at all for VMS
or Macintosh. There is no place for meta information in the format. It became
the most popular format under UNIX, but was never popular under DOS or TOS.
There is one additional disadvantage of this format, it compresses the whole
archive not individual files. This means that if you want to extract a file
in the middle of the archive you must uncompress everything up to that file.
On the other hand Zip and Zoo compress individual files, so to uncompress a
single file you just seek to that file and then decompress that file. This is
important for an extension of GAP, where the unpacker is compiled into GAP
and allows one to keep individual parts of the library in compressed format
and allows GAP to treat those archives as if they are subdirectories.

So the decision was to use the Zoo format. To make unpacking for UNIX easier
unzoo was written, which compiles very easily under any UNIX flavour. Unzoo
also uses the meta information in the comments to extract files as text or
binary (or MacBinary) as appropriate. The decision to use Zoo instead of Zip
was triggered by three minor facts. First Zoo was more popular under TOS.
Second Zoo allows comments for each individual member, while the Zip format
only allows a single comment for the entire archive, this made storing the
meta information easier. Third the central directory at the end of a Zip
archive makes incomplete archives useless. However, had I forseen the extreme
popularity of the Zip format (which is due to Winzip, which was not available
then), and the fact that DOS/Windows is now probably the most important
platform for GAP, I would have selected Zip instead of Zoo.

I think that today there are three realistic alternatives.

Use the Zip format, which is perfect for DOS/Windows users, and require all
UNIX users to install Info-Zip, which is not so bad but certainly more work
than to compile 'unzoo'. The major disadvantage is that the basic Zip format
does not provide for meta information, so users would be required to tell
Zip whether to extract files with or without conversion.

Use the Zip format and write an 'unzip'. This is probably best, except for
the effort to write 'unzip'.

Stick with the Zoo format, which is a minor inconvenience for the DOS/Windows
users.

I personally think that the last alternative is still the best.

Kindly, Martin.

--
Martin Sch"onert, Aachen, Germany, Martin.Schoenert@web.de
One must imagine Sisyphos happy. - Albert Camus

Miles-Receive-Header: reply


> < [top]