Compressed data files

From csml-wiki.northwestern.edu
Jump to navigation Jump to search

Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended.

Advantages

  • Save disk/tape space
  • Faster writing of data (the CPU time expenditure for the compression is negligible compared to the time save by writing smaller files)
  • File corruption is detected more easily
  • Easier transfer to other computer systems

Dealing with truncated compressed files

A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of gzip -d (or gunzip) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by decompressing "on the fly" and redirecting the data stream to a file. For example, if output.dat.gz is truncated, use

gzip -dc output.dat.gz > output.dat

This will yield the same error message as gzip -d output.dat.gz but now the decompressed file contents (until close to the truncation point) will be saved to output.dat

Avoiding decompression of compressed files

Ideally, compressed data files are not decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the generic analyzer and autocorrelation can handle compressed data directly. Indeed, even gnuplot can plot compressed data files, see gnuplot usage notes.