Compressed data files: Difference between revisions

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
mNo edit summary
m (fixed one small typo)
Line 15: Line 15:
</pre>
</pre>


This will yield the same error message as <tt>gzip -d output.dat.gz</tt> but now the decompressed file contents (until close to the truncation point) will be saved to <tt>dump.dat</tt>
This will yield the same error message as <tt>gzip -d output.dat.gz</tt> but now the decompressed file contents (until close to the truncation point) will be saved to <tt>output.dat</tt>


=== Avoiding decompression of compressed files ===
=== Avoiding decompression of compressed files ===

Revision as of 14:29, 29 October 2014

Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended.

Advantages

  • Save disk/tape space
  • Faster writing of data (the CPU time expenditure for the compression is negligible compared to the time save by writing smaller files)
  • File corruption is detected more easily
  • Easier transfer to other computer systems

Dealing with truncated compressed files

A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of gzip -d (or gunzip) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if output.dat.gz is truncated, use

gzip -dc output.dat.gz > output.dat

This will yield the same error message as gzip -d output.dat.gz but now the decompressed file contents (until close to the truncation point) will be saved to output.dat

Avoiding decompression of compressed files

Ideally, compressed data files are not decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the generic analyzer and autocorrelation can handle compressed data directly. Indeed, even gnuplot can plot compressed data files, see gnuplot usage notes.