Compressed data files: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended. |
|||
=== Advantages === |
|||
* Save disk/tape space |
* Save disk/tape space |
||
Line 5: | Line 8: | ||
* Easier transfer to other computer systems |
* Easier transfer to other computer systems |
||
=== Dealing with truncated compressed files === |
|||
⚫ | A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of |
||
⚫ | A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of <tt>gzip -d</tt> (or <tt>gunzip</tt>) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if <tt>output.dat.gz</tt> is truncated, use |
||
<pre> |
|||
gzip -dc output.dat.gz > output.dat |
gzip -dc output.dat.gz > output.dat |
||
</pre> |
|||
⚫ | |||
=== Avoiding decompression of compressed files === |
|||
Ideally, compressed data files are '''not''' decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the [[Generic_Analyzer|generic analyzer]] and [[Autocorrelation|autocorrelation]] can handle compressed data directly. Indeed, even [[Gnuplot|gnuplot]] can plot compressed data files, see [[Gnuplot#General_usage_tips|gnuplot usage notes]]. |
|||
⚫ |
Revision as of 14:00, 29 October 2014
Many computer simulation packages offer the option to write data files in compressed form. This is strongly recommended.
Advantages
- Save disk/tape space
- Faster writing of data (the CPU time expenditure for the compression is negligible compared to the time save by writing smaller files)
- File corruption is detected more easily
- Easier transfer to other computer systems
Dealing with truncated compressed files
A situation sometimes encountered is that a compressed file is truncated (this most frequently occurs when a program aborts and a file buffer was not flushed). Naive application of gzip -d (or gunzip) will lead to a "unexpected end of file" error message. However, it is easy to access most of the data in this file by compressing "on the fly" and redirecting the data stream to a file. For example, if output.dat.gz is truncated, use
gzip -dc output.dat.gz > output.dat
This will yield the same error message as gzip -d output.dat.gz but now the decompressed file contents (until close to the truncation point) will be saved to dump.dat
Avoiding decompression of compressed files
Ideally, compressed data files are not decompressed during analysis. Decompression would require additional diskspace, as well as time to compress the data again after the analysis. Instead, analysis tools such as the generic analyzer and autocorrelation can handle compressed data directly. Indeed, even gnuplot can plot compressed data files, see gnuplot usage notes.