Zauruses, Squashfs and bulk data
draft version
Summary
We're going to create a squashfs file system, rather than a zip or
tar.gz archive, which compresses the files into a single file and
yet allows us to mount as we can then mount it on the zaurus and
randomly access the files. Squashfs is more effective than cramfs
(which Sharp and Cacko use), but cannot be mounted read-write. Skip
to the why's and wherefores section for background information.
Lets do it!
The hardest part of this process is downloading a website so that
we end up with a set of web pages that correctly link to each other,
contain the CSS and images need, without downloading anything else!
For this example, we are going to archive a web site called www.example.com,
generate a squashfs archive and then mount it on the Zaurus for offline use.
Firstly, lets snapshot a website, its HTML and CSS files only, which we'll do into /tmp (change to
/mnt/card if doing this on the zaurus!):
$ cd /tmp
$ mkdir example
$ cd example
$ wget --convert-links --restrict-file-names=windows -E -r -v -np -k -c --limit-rate=20k -A ".css,.html" www.example.com
Then, to make the squashfs archive, we simply do this, whilst still in the example directory:
$ mksquashfs2 . /tmp/www.example.com.squashfs -all-root -info
To test it, become root, create a mount point and mount it.
$ su -
password: *****
# mkdir /mnt/www.example.com
# mount -o loop,ro -t squashfs /tmp/www.example.com.squashfs /mnt/www.example.com
If the above works, you should be able to point your web browser at
file:///mnt/www.example.com and browse the archive of the website!
Then, copy the squashfs file to your zaurus where you can also do
the loopback mount.
Archiving OESF forum
I've found that the OESF forum is not particularly amenable to using wget as
the URI parameters aren't properly mangled to make unqiue file names, and
the only solution I've found is to use HT Track website copier.
It's a very powerful tool, use with care and be sure to rate-limit its
downloading a site! Ensure you set it to only download from the website
you configured, not to cross hostname boundaries, and not to come up
directories. Watch it carefully as it works, checking it's not going too far!
So, once you've downloaded the website, you can then copy it to a linux
box for creating the squashfs archive - use leechftp, winscp, samba shares
or whatever you want for this bulk copy!
Why's and wherefores
Although memory storage is cheap, it's still not free, and some people with
older Zauruses are still stuck with the 1GB flash card limit. One
solution is to zip all the files up and then extract the ones needed.
A similar but more linux-y way of doing it is to use a tar (tape archive)
file which can then be compressed with gzip to make a .tar.gz or .tgz file.
The problem with using a zip or .tgz is that you have to extract the files
whenever you want to access them. Well, this is not quite true, as Windows
and KDE both allow you to browse a zip or tgz file just as if it was
a regular directory, but this is not possible on the Sharp/Cacko ROM,
possibly not on Angstrom or Debian or pdaXrom either. Even so, this is
just a hack, there has to be a better way!
Well, there is!
Zaurus's operating system is based on linux and has a modular kernel, which
means you can use any file system supported by the linux kernel, so not
just ext2, ext3, & (v)fat but also archival/fixed types of file
systems like cramfs or squashfs.
Since Linux has "loopback" mount functions built in, it means that it
is possible to take a file which contains a file system,
and mount it just like a disk, and make the files available online; this
is in contrast to say a zip or tar file where files have to be extracted
before use.
A normal file system for linux such as ext2,ext3,xfs & reisfer is largely
optimised for robustness of storage and read and write performance, with
each file's usage being rounded up to multiples of blocks, and with a lot
of ancillary information such as owner, created/modified/last-access times.
There are also algorithms in use to prevent disk fragmentation, to allow
for files to grow etc, which means files are not necesarily contiguous
on the disk; moreover, when files do get fragmented it means more overhead
in chaining them together.
When there are a large number of files, this can be quite a considerable
overhead on storage.
An additional layer of overhead is created by the fact that the Zaurus uses
a flash file system which tries to spread the re-writing of data across
the disk to prevent over-using the same areas.
When the files are read-only so do not need to be changed, fragmentation
is not an issue, writing speed is irrelevant, then data compression becomes
a useful too; since the files are static, it is no longer necessary to
pad them out to multiple blocks, as they will not grow.
Cramfs or squashfs use a variety of algorithms to take advantage of all the
space savings possible; cramfs seems to have been abandoned, and squashfs
offers much better space settings (the OESF forum snapshot had a cramfs size
of over 61MB, squashfs achieved 32MB or so).
On the Zaurus, Cacko uses both, but it uses squashfs v2 and not 3,
and version 3 filesystems are not compatible with v2 so be sure to download
the squashfs v2 source.
This site currently under construction
|