Autocorrelation: Difference between revisions

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
Line 26: Line 26:
* <tt id="x-option">-x</tt><br>Only compute the cross term in the autocorrelation function, i.e., the first term in C(t) (see [[#Algorithm|algorithm]] below).
* <tt id="x-option">-x</tt><br>Only compute the cross term in the autocorrelation function, i.e., the first term in C(t) (see [[#Algorithm|algorithm]] below).


=== Interpreting autocorrelation output ===
=== Interpreting output ===


(coming soon)
Topics (coming soon):

* choosing maxtime
* anticorrelations
* negative time cross correlations
* ...


=== Special usage notes ===
=== Special usage notes ===

Revision as of 08:37, 21 June 2014

Overview

The "autocorrelation" (AC) program is based upon the generic analyzer. It computes the integrated autocorrelation time for time series of data, on a per column basis. It is particularly useful for processing output from Monte Carlo and molecular dynamics simulations. The program is available on all local machines.
The autocorrelation program was originally written in 2002/2003 by Erik Luijten for efficiency comparisons involving the Geometric Cluster Algorithm.

General usage

 autocorrelation [options] filename starttime maxtime

filename is a plain text file. Lines starting with a '#' will be ignored (use this feature to insert column descriptions and other information into your simulation data). If filename ends with '.gz' the file is assumed to be compressed with gzip and will be decompressed on the fly. Note that this happens in memory; no decompressed version of the file is written to disk. This has the advantage that no additional disk space is required and that no additional time is required to compress the data again after the analysis.

To read from standard input instead of a file, specify STDIN as the filename.

When redirecting the output, note that the autocorrelation function for each column is written to standard output, whereas the autocorelation time (and all other information) is written to standard error. See usage notes below.

starttime and maxtime are mandatory arguments. starttime is used to discard initial samples. Normally it is simply set to 0. maxtime sets the upper bound for the integral of the autocorrelation function.

Options

  • -a
    Compute the autocorrelation time by integrating the absolute autocorrelation function. This is useful to eliminate cancellations due to anticorelations (which would result in an underestimation of the correlation time), but also will enhance noise in the tail of the autocorrelation function (where the function fluctuates around zero; normally such fluctuations cancel out).
  • -c n:m
    Compute the cross correlation of columns n and m. Note: columns are numbered from 0.
  • -e
    Disable truncating the data set when using FFT-based calculation (which in turn is enabled via the -f option). Normally, use of -f truncates the data set to a multiple of 256 (for more than 10240 samples) or a multiple of 16384 (for more than 106 samples). Note that disabling this truncation carries a significant speed penalty. However, it makes the results identical to those obtained without using the FFT.
  • -f
    Employ a Fast Fourier Transform to greatly reduce the computational effort, by analyzing the data as a convolution in the frequency domain. This works for the autocorrelation function (Wiener-Khinchin Theorem) as well as for the cross correlation (option -c) (Cross-Correlation Theorem) and the mean squared displacement (option -m). Note that this requires computing the function over the entire time domain; therefore, for very rapidly decaying functions, where maxtime can be kept small, conventional evaluation might be faster.
  • -m
    Calculate the mean squared displacement (MSD). See algorithm below for the relation between the MSD and the autocorrelation function.
  • -o filename
    Write autocorrelation function to filename.
  • -u
    Do not normalize the autocorrelation function. Without this option, the autocorelation function C(t) is normalized by C(0). Note that this option disables calculation of the autocorrelation time (see algorithm below).
  • -x
    Only compute the cross term in the autocorrelation function, i.e., the first term in C(t) (see algorithm below).

Interpreting output

Topics (coming soon):

  • choosing maxtime
  • anticorrelations
  • negative time cross correlations
  • ...

Special usage notes

  • Note that, per standard GNU style, options can be combined. Thus, for example, '-a -e -f' can be specified as '-aef'.
  • (coming soon)

Algorithm

If the autocorrelation function decays exponentially, , then can be found as

.

This concept is generalized by integrating the autocorrelation function irrespective of its functional form. Thus, the autocorrelation time is computed as

,

where is the autocorrelation function of a time series

,

which can be evaluated for .

For two time series and , the cross correlation is defined as

.

If the variable is a time-dependent coordinate , then the mean squared displacement (MSD) for a time interval (time difference) t is defined as

,

which can be written as

.

Evaluation of the first two terms for requires operations. The third term we recognize as as proportional to the first term (or "cross term") in . This also implies that we can accelerate calculation of the MSD via FFT (use options -m -f), and evaluate the function for the entire time domain at cost .

Download binary versions (Linux and OS X)

The current version is 4.0, dated September 2010. It is strongly recommend that you upgrade from any earlier version. You can download binary versions of this program here, but note that this was created for internal lab use - we cannot provide any support.

(binaries coming soon)