Autocorrelation: Difference between revisions

Revision as of 06:33, 21 June 2014

Overview

The "autocorrelation" (AC) program is based upon the generic analyzer. It computes the integrated autocorrelation time for time series of data, on a per column basis. It is particularly useful for processing output from Monte Carlo and molecular dynamics simulations. The program is available on all local machines.
The autocorrelation program was originally written in 2002/2003 by Erik Luijten for efficiency comparisons involving the Geometric Cluster Algorithm.

General usage

 autocorrelation [options] filename starttime maxtime

filename is a plain text file. Lines starting with a '#' will be ignored (use this feature to insert column descriptions and other information into your simulation data). If filename ends with '.gz' the file is assumed to be compressed with gzip and will be decompressed on the fly. Note that this happens in memory; no decompressed version of the file is written to disk. This has the advantage that no additional disk space is required and that no additional time is required to compress the data again after the analysis.

To read from standard input instead of a file, specify STDIN as the filename.

When redirecting the output, note that the autocorrelation function for each column is written to standard output, whereas the autocorelation time (and all other information) is written to standard error. See usage notes below.

starttime and maxtime are mandatory arguments. starttime is used to discard initial samples. Normally it is simply set to 0. maxtime sets the upper bound for the integral of the autocorrelation function.

Options

-a
Compute the autocorrelation time by integrating the absolute autocorrelation function. This is useful to eliminate cancellations due to anticorelations (which would result in an underestimation of the correlation time), but also will enhance noise in the tail of the autocorrelation function (where the function fluctuates around zero; normally such fluctuations cancel out).
-c n:m
Compute the cross correlation of columns n and m.
-e
Disable truncating the data set when using FFT-based calculation (which in turn is enabled via the -f option). Normally, use of -f truncates the data set to a multiple of 256 (for more than 10240 samples) or a multiple of 16384 (for more than 10⁶ samples). Note that disabling this truncation carries a significant speed penalty. However, it makes the results identical to those obtained without using the FFT.
-f
Employ a Fast Fourier Transform to greatly reduce the computational effort, by analyzing the data as a convolution in the frequency domain. This works for the autocorrelation function (Wiener-Khinchin Theorem) as well as for the cross correlation (option -c) (Cross-Correlation Theorem) and the mean squared displacement (option -m). Note that this requires computing the function over the entire time domain; therefore, for very rapidly decaying functions, where maxtime can be kept small, conventional evaluation might be faster.
-m
Calculate the mean squared displacement (MSD). See algorithm below for the relation between the MSD and the autocorrelation function.
-o filename
Write autocorrelation function to filename.
-u
Do not normalize the autocorrelation function. Without this option, the autocorelation function C(t) is normalized by C(0). Note that this option disables calculation of the autocorrelation time (see algorithm below).
-x
Only compute the cross term in the autocorrelation function, i.e., the first term in C(t) (see algorithm below).

Interpreting autocorrelation output

(coming soon)

Special usage notes

Note that, per standard GNU style, options can be combined. Thus, for example, '-a -e -f' can be specified as '-aef'.
(coming soon)

Algorithm

If the autocorrelation function decays exponentially, $C(t)=A\exp(-t/\tau )$ , then $\tau$ can be found as

$\tau =\int _{0}^{\infty }[C(t)/C(0)]dt$ .

This concept is generalized by integrating the autocorrelation function irrespective of its functional form. Thus, the autocorrelation time is computed as

$\tau =\sum _{i=0}^{\mathrm {maxtime} }[C(i)/C(0)]$ ,

where $C(t)$ is the autocorrelation function of a time series $a_{1},\ldots ,a_{N}$

$C(t)={\frac {1}{N-t}}\sum _{j=1}^{N-t}a_{j}a_{j+t}-\left({\frac {1}{N-t}}\sum _{j=1}^{N-t}a_{j}\right)\left({\frac {1}{N-t}}\sum _{j=1}^{N-t}a_{j+t}\right)$ ,

which can be evaluated for $0\leq t<N$ .

For two time series $a_{1},\ldots ,a_{N}$ and $b_{1},\ldots ,b_{N}$ , the cross correlation is defined as

$X(t)={\frac {1}{N-t}}\sum _{j=1}^{N-t}a_{j}b_{j+t}-\left({\frac {1}{N-t}}\sum _{j=1}^{N-t}a_{j}\right)\left({\frac {1}{N-t}}\sum _{j=1}^{N-t}b_{j+t}\right)$ .

If the variable $a_{i}$ is a time-dependent coordinate $x$ , then the mean squared displacement (MSD) for a time interval (time difference) t is defined as

$M(t)={\frac {1}{N-t}}\sum _{j=1}^{N-t}\left(x_{j+t}-x_{j}\right)^{2}$ ,

which can be written as

$M(t)={\frac {1}{N-t}}\sum _{j=1}^{N-t}\left(x_{j}\right)^{2}+{\frac {1}{N-t}}\sum _{j=1}^{N-t}\left(x_{j+t}\right)^{2}-{\frac {2}{N-t}}\sum _{j=1}^{N-t}x_{j}x_{j+t}$ .

Evaluation of the first two terms for $t=0,\ldots ,N-1$ requires ${\mathcal {O}}(N)$ operations. The third term we recognize as as proportional to the first term (or "cross term") in $C(t)$ . This also implies that we can accelerate calculation of the MSD via FFT (use options -m -f), and evaluate the function for the entire time domain at cost $N\log N$ .

Download binary versions (Linux and OS X)

The current version is 4.0, dated September 2010. It is strongly recommend that you upgrade from any earlier version. You can download binary versions of this program here, but note that this was created for internal lab use - we cannot provide any support.

(binaries coming soon)

@@ Line 77: / Line 77: @@
 </math>.
-Evaluation of the first two terms for <math>t=0,\ldots,N-1</math> requires <math>\mathcal{O}(N)</math> operations. The third term we recognize as (-2) times the first term (or "cross term") in <math>C(t)</math>. This also implies that we can accelerate calculation of the MSD via FFT (use options <tt>-m -f</tt>), and evaluate the function for the entire time domain at cost <math>N \log N</math>.
+Evaluation of the first two terms for <math>t=0,\ldots,N-1</math> requires <math>\mathcal{O}(N)</math> operations. The third term we recognize as as proportional to the first term (or "cross term") in <math>C(t)</math>. This also implies that we can accelerate calculation of the MSD via FFT (use options <tt>-m -f</tt>), and evaluate the function for the entire time domain at cost <math>N \log N</math>.
 === Download binary versions (Linux and OS X) ===

Autocorrelation: Difference between revisions

Revision as of 06:33, 21 June 2014

Contents

Overview

General usage

Options

Interpreting autocorrelation output

Special usage notes

Algorithm

Download binary versions (Linux and OS X)

Navigation menu

Autocorrelation: Difference between revisions

Revision as of 06:33, 21 June 2014

Overview

General usage

Options

Interpreting autocorrelation output

Special usage notes

Algorithm

Download binary versions (Linux and OS X)

Navigation menu

Search