James McCarron wrote in his e-mail message of 1994/12/14
With that help, I have studied the code, and tried a few things.
I am beginning to believe this is not as easy as I had hoped.
I think it is easier than you believe.
James continued
Rather than rewriting SyFopen, SyFclose, etc., I thought that it
might be safer to write wrapper functions for these two, and then
replace them by the wrapper functions at those places where I want
to open and close compressed files.
For example, this is the wrapper function I have tried for
SyFopen:long GzFopen(char *name, char *mode) { long fid; /* file identifier *//* uncompress the file */
Gunzip(name);/* open the file */ fid = SyFopen(name, mode);/* record the name of the file in syBuf[fid] */
syBuf[fid].fname = (char *)malloc(1 + sizeof(name)/sizeof(char));
syBuf[fid].fname = name;return fid;
} /* GzFopen */The function 'Gunzip' simply uncompresses the file "name" using
SyExec.This required changing the structure 'syBuf' in system.c, adding the
componentchar fname[MAXPATHLEN];
This seems to be necessary so that, when the fid is passed to the
corresponding function 'GzFclose', which closes and then compresses
a file, we can recover the filename to pass to SyExec.
If you really declared the component as 'char fname[MAXPATHLEN]'
(and not 'char *fname') then storage for 'fname' is statically allocated,
and there is no need to allocate storage for 'fname' dynamically.
In fact in this case 'syBuf[fid].fname' is *not* a valid left-hand-side
and your compiler should complain about 'syBuf[fid].fname = malloc(...)'.
In any case you need to copy the *contents* of 'name' to 'fname',
not the *pointer*, i.e., you need to write
'SyStrncat( syBuf[fid].fname, name, MAXPATHLEN-1 );'
James continued
For the moment, I am just playing with the online help. It seems
a safer place to experiment than with the library files *.g.
Working with the online help may actually be more difficult than working
with the library.
James continued
Okay, sounds good so far. However, when I replace all the calls
to SyFopen and SyFclose in the function SyHelp (in system.c), everything
seems to work just fine, until I try to use the online help. I think
there must be other calls to these functions, outside of SyHelp, which I
have not found. When I try to call the online help with, saygap>?Chapters
Gap replies that it cannot find the file 'manual.toc'. (I have compressed
all the manual files). If I uncompress manual.toc before trying this,
Gap will give me the list of Chapters as per normal. As expected, when
the command is complete, the file manual.toc ends up compressed. So maybe
GzFclose is working. However, any subsequent request to list chapters
fails as before.
The question is: Why might Gap not be able to find the manual.toc
file when it is compressed?
If 'Gunzip' really does what you say, then I see no reason why this
should not work. Note that there are many calls to 'SyFopen' in
'SyHelp', maybe you missed one? 'SyHelp' is selfcontained, there
is nothing outside of 'SyHelp' that influences 'SyHelp', except
for the setting of the pathname 'Helpname' in 'SyInit'.
James continued
From the comments in 'system.c', I realise that malloc should not
be used. I tried this with 'SyGetmem' in place of 'malloc', but when I
use that, it gives me a Bus error and core dump. (and really messes my
terminal :-) I wonder if I may not be using 'SyGetmem' correctly? I have
used it exactly where i have 'malloc' above (same args, etc.).
If you declare 'fname' as 'char fname[MAXPATHLEN]', the storage is
allocated statically (when GAP starts), and there is no need to
allocate something dynamically. So you don't need 'malloc' or
'SyGetmem' then. Using either is not a very good idea, because it
usually has the effect that GAP cannot extend its workspace anymore.
James continued
I have only tried this sort of thing before in shell scripts, where
I can set up a pipeline. I wonder whether there might be a better
solution along those lines.
Here is such a solution.
--- system.c.orig Thu Dec 15 10:49:13 1994 +++ system.c Thu Dec 15 10:49:19 1994 @@ -510,6 +510,10 @@ # include <stdio.h> # define SYS_STDIO_H #endif +#ifndef SYS_UNISTD_H /* definition of 'R_OK' */ +# include <unistd.h> +# define SYS_UNISTD_H +#endif #ifndef SYS_HAS_STDIO_PROTO /* ANSI/TRAD decl. from H&S 15 */ extern FILE * fopen P(( SYS_CONST char *, SYS_CONST char * )); extern int fclose P(( FILE * )); @@ -521,6 +525,7 @@ struct { FILE * fp; /* file pointer for this file */ FILE * echo; /* file pointer for the echo */ + long pipe; /* file is really a pipe */ char buf [BUFSIZ]; /* the buffer for this file */ } syBuf [16]; @@ -552,6 +557,8 @@ char * mode; { long fid; + char namegz [1024]; + char cmd [1024]; /* handle standard files */ if ( SyStrcmp( name, "*stdin*" ) == 0 ) { @@ -578,10 +585,26 @@ if ( fid == sizeof(syBuf)/sizeof(syBuf[0]) ) return (long)-1; + /* set up <namegz> and <cmd> for pipe command */ + namegz[0] = '\0'; + SyStrncat( namegz, name, sizeof(namegz)-5 ); + SyStrncat( namegz, ".gz", 4 ); + cmd[0] = '\0'; + SyStrncat( cmd, "gunzip <", 9 ); + SyStrncat( cmd, namegz, sizeof(cmd)-10 ); + /* try to open the file */ - syBuf[fid].fp = fopen( name, mode ); - if ( syBuf[fid].fp == (FILE*)0 ) + if ( (syBuf[fid].fp = fopen(name,mode)) ) { + syBuf[fid].pipe = 0; + } + else if ( SyStrcmp(mode,"r") == 0 + && access(namegz,R_OK) == 0 + && (syBuf[fid].fp = popen(cmd,mode)) ) { + syBuf[fid].pipe = 1; + } + else { return (long)-1; + } /* allocate the buffer */ setbuf( syBuf[fid].fp, syBuf[fid].buf ); @@ -613,7 +636,8 @@ } /* try to close the file */ - if ( fclose( syBuf[fid].fp ) == EOF ) { + if ( (syBuf[fid].pipe == 0 && fclose( syBuf[fid].fp ) == EOF) + || (syBuf[fid].pipe == 1 && pclose( syBuf[fid].fp ) == -1) ) { fputs("gap: 'SyFclose' cannot close file, ",stderr); fputs("maybe your file system is full?\n",stderr); } @@ -4194,12 +4218,12 @@ SyHelpname[0] = '\0'; #if SYS_BSD || SYS_MACH || SYS_USG || SYS_OS2_EMX SyStrncat( SyHelpname, argv[2], sizeof(SyLibname)-2 ); - if ( SyLibname[SyStrlen(SyHelpname)-1] != '/' ) + if ( SyHelpname[SyStrlen(SyHelpname)-1] != '/' ) SyStrncat( SyHelpname, "/", 1 ); #endif #if SYS_MSDOS_DJGPP || SYS_TOS_GCC2 SyStrncat( SyHelpname, argv[2], sizeof(SyLibname)-2 ); - if ( SyLibname[SyStrlen(SyHelpname)-1] != '\\' ) + if ( SyHelpname[SyStrlen(SyHelpname)-1] != '\\' ) SyStrncat( SyHelpname, "\\", 1 ); #endif ++argv; --argc;
On each open, it first tests whether there is an uncompressed file
that can be used. If not and the open is for reading, then it
tests whether there is a compressed file. If so, it opens a
*pipe* from 'gunzip' and reads from that pipe. We only need to
remember (in 'syBuf[fid]') whether the file is a proper file
or a pipe, so that we can use 'fclose' or 'pclose' as appropriate.
You may have to tweak the code a little bit, depending on where
stuff like 'R_OK' is declared (you may also need to add
'COPTS=-DSYS_HAS_READ_PROTO' to the 'make' command).
This works very nicely for me. 'lib' and 'doc' now take less
than 1 MByte each, and the performance penalty is not dramatic.
But this only works under UNIX (or other systems that support 'popen').
On the other hand compressing library files and documentation might
be more important on DOS or Macintosh systems. And then again on
those systems already support compression at the file system level,
so compressing the files again may not gain that much. Perhaps the
nicest solution would be to incorporate part of the 'unzoo' source in
'system.c', so that GAP can read files directly from the 'zoo' archives,
without the need to start a separate process. This should be possible,
but quite a bit more complicated than the above solution.
James continued
By the way, is there any documentation on the implementation of GAP?
It occurs to me that some of GAP's authors might have written papers on
the *implementation* (as opposed to user documentation, which is, of
course, excellent). I found none in the bibliography of the manual.
But a project of this size must surely
have been documented in the literature? Perhaps if I studied some of this
more technical material, it would help me navigate the code.
No, there is no documentation on the implementation of GAP, except for
the comments in the source files. We agree that there should be such
documentation. And we plan to write some such reports for the next
version of GAP. There already is a longish report on the new storage
manager available at 'ftp://ftp.math.rwth-aachen.de/pub/gasman/'.
Martin.
-- .- .-. - .. -. .-.. --- ...- . ... .- -. -. .. -.- .- Martin Sch"onert, Martin.Schoenert@Math.RWTH-Aachen.DE, +49 241 804551 Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, 52056 Aachen, Germany