Compressed data files

From csml-wiki.northwestern.edu
Revision as of 12:31, 29 October 2014 by Administrator (talk | contribs)
Jump to navigation Jump to search
  • Save disk/tape space
  • Faster writing of data (the CPU time expenditure for the compression is negligible compared to the time save by writing smaller files)
  • File corruption is detected more easily
  • Easier transfer to other computer systems

A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of "gzip -d" (or "gunzip") will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if "output.dat.gz" is truncated, use

gzip -dc output.dat.gz > output.dat

This will yield the same error message as "gzip -d output.dat.gz" but now the decompressed file contents until close to the truncation point will be saved to "dump.dat"