Compressed data files: Difference between revisions

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 1: Line 1:
Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended.

=== Advantages ===


* Save disk/tape space
* Save disk/tape space
Line 5: Line 8:
* Easier transfer to other computer systems
* Easier transfer to other computer systems


=== Dealing with truncated compressed files ===
A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of "gzip -d" (or "gunzip") will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if "output.dat.gz" is truncated, use


A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of <tt>gzip -d</tt> (or <tt>gunzip</tt>) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if <tt>output.dat.gz</tt> is truncated, use
<pre>
gzip -dc output.dat.gz > output.dat
gzip -dc output.dat.gz > output.dat
</pre>

This will yield the same error message as <tt>gzip -d output.dat.gz</tt> but now the decompressed file contents (until close to the truncation point) will be saved to <tt>dump.dat</tt>

=== Avoiding decompression of compressed files ===


Ideally, compressed data files are '''not''' decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the [[Generic_Analyzer|generic analyzer]] and [[Autocorrelation|autocorrelation]] can handle compressed data directly. Indeed, even [[Gnuplot|gnuplot]] can plot compressed data files, see [[Gnuplot#General_usage_tips|gnuplot usage notes]].
This will yield the same error message as "gzip -d output.dat.gz" but now the decompressed file contents until close to the truncation point will be saved to "dump.dat"

Revision as of 14:00, 29 October 2014

Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended.

Advantages

  • Save disk/tape space
  • Faster writing of data (the CPU time expenditure for the compression is negligible compared to the time save by writing smaller files)
  • File corruption is detected more easily
  • Easier transfer to other computer systems

Dealing with truncated compressed files

A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of gzip -d (or gunzip) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if output.dat.gz is truncated, use

gzip -dc output.dat.gz > output.dat

This will yield the same error message as gzip -d output.dat.gz but now the decompressed file contents (until close to the truncation point) will be saved to dump.dat

Avoiding decompression of compressed files

Ideally, compressed data files are not decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the generic analyzer and autocorrelation can handle compressed data directly. Indeed, even gnuplot can plot compressed data files, see gnuplot usage notes.