Autocorrelation
Overview
The "autocorrelation" (AC) program is based upon the generic analyzer. It computes the integrated autocorrelation time for time series of data, on a per column basis. It is particularly useful for processing output from Monte Carlo and molecular dynamics simulations. The program is available on all local machines.
The autocorrelation program was originally written in 2002/2003 by Erik Luijten for efficiency comparisons involving the Geometric Cluster Algorithm.
General usage
autocorrelation [options] filename starttime maxtime
filename is a plain text file. Lines starting with a '#' will be ignored (use this feature to insert column descriptions and other information into your simulation data). If filename ends with '.gz' the file is assumed to be compressed with gzip and will be decompressed on the fly. Note that this happens in memory; no decompressed version of the file is written to disk. This has the advantage that no additional disk space is required and that no additional time is required to compress the data again after the analysis.
To read from standard input instead of a file, specify STDIN as the filename.
When redirecting the output, note that the autocorrelation function for each column is written to standard output, whereas the autocorelation time (and all other information) is written to standard error. See usage notes below.
starttime and maxtime are mandatory arguments. starttime is used to discard initial samples. Normally it is simply set to 0. maxtime sets the upper bound for the integral of the autocorrelation function.
Options
- -a
Compute the autocorrelation time by integrating the absolute autocorrelation function. This is useful to eliminate cancellations due to anticorelations (which would result in an underestimation of the correlation time), but also will enhance noise in the tail of the autocorrelation function (where the function fluctuates around zero; normally such fluctuations cancel out). - -c n:m
Compute the cross correlation of columns n and m. Note: columns are numbered from 0. - -d
Only print the function (autocorrelation, cross correlation, or mean squared displacement) up to the lagtime specified, even when using an FFT-based calculation (which in turn is enabled via the –f option). Since the FFT-based calculation yields these functions over the entire domain at no additional computational cost, normally the maximum lagtime is ignored for printing purposes; this option overrides this. - -e
Disable truncating the data set when using an FFT-based calculation (which in turn is enabled via the –f option). Normally, use of –f truncates the data set to a multiple of 256 (for more than 10240 samples) or a multiple of 16384 (for more than 106 samples). Note that disabling this truncation carries a significant speed penalty (and is primarily used for debugging purposes). However, it makes the results identical to those obtained without using the FFT. (If the size of the original data set is already a multiple of 256 (or 16384), then the results with and without FFT are identical even without this option.) - -f
Employ a Fast Fourier Transform to greatly reduce the computational effort, by analyzing the data as a convolution in the frequency domain. This works for the autocorrelation function (Wiener-Khinchin Theorem) as well as for the cross correlation (option –c) (Cross-Correlation Theorem) and the mean squared displacement (option –m). Note that this requires computing the function over the entire time domain; therefore, for very rapidly decaying functions, where maxtime can be kept small, conventional evaluation might be faster. - -m
Calculate the mean squared displacement (MSD). See algorithm below for the relation between the MSD and the autocorrelation function. - -o filename
Write autocorrelation function to filename. - -u
Do not normalize the correlation function. Without this option, the autocorrelation function C(t) or cross correlation function X(t) is normalized by C(0) or X(0), respectively. Note that this option disables calculation of the autocorrelation time (see algorithm below). This option has no meaning in conjunction with the –m option (MSD calculation). - -x
Only compute the cross term in the correlation function, i.e., the first term in C(t) or X(t) (see algorithm below). This option has no meaning in conjunction with the –m option (MSD calculation).
Interpreting output
Topics (coming soon):
- choosing maxtime
- anticorrelations
- negative time cross correlations
- ...
Special usage notes
- Note that, per standard GNU style, options can be combined. Thus, for example, '-a -e -f' can be specified as '-aef'.
- (more tips coming soon)
Algorithm
For a discrete time series Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_1, \ldots, a_N} , the autocovariance is given by
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(t) = \frac{1}{N-t} \sum_{j=1}^{N-t} a_j a_{j+t} - \left(\frac{1}{N-t} \sum_{j=1}^{N-t} a_j \right) \left(\frac{1}{N-t} \sum_{j=1}^{N-t} a_{j+t} \right) } ,
which can be evaluated for Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 0 \leq t < N} . The autocorrelation (or autocorrelation function) is the autocovariance normalized by the variance of Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} . If we assume time invariance, this variance is Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(0)} , so that the autocorrelation becomes Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(t)/C(0)} . (The special case Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(0)=0} , which arises for a constant time series, is caught by AC.) The time difference Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t} is also called the lag time. By default, the AC code computes the autocorrelation; the autocovariance is obtained via the -u option.
If the autocorrelation function decays exponentially, , then Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \tau} can be found as
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \tau = \int_0^\infty [C(t)/C(0)] dt} .
This concept is generalized by integrating the autocorrelation function irrespective of its functional form. Thus, the autocorrelation time is computed as
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \tau = \sum_{i=0}^{\mathrm{maxtime}} [C(i)/C(0)]} .
Some more nomenclature: Sometimes the autocorrelation refers to the cross term Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \langle a_j a_{j+t}\rangle} (where the brackets indicate averaging over Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle j} ), which is the first term in Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(t)} . In that case, Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(t)/C(0)} is referred to as the autocorrelation coefficient or autocorrelation coefficient function. To compute the cross term in AC, specify the -x option (or -u -x to suppress normalization).
For two time series Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_1, \ldots, a_N} and Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle b_1, \ldots, b_N} , the covariance is defined as
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X(t) = \frac{1}{N-t} \sum_{j=1}^{N-t} a_j b_{j+t} - \left(\frac{1}{N-t} \sum_{j=1}^{N-t} a_j \right) \left(\frac{1}{N-t} \sum_{j=1}^{N-t} b_{j+t} \right) } .
The cross correlation is then Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X(t)/X(0)} . AC computes cross correlations via the -c option. Moreover, as for the autocorrelation, it is possible to suppress normalization via -u, and to only compute the first term of the covariance via -x.
Finally, if the variable Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_i} is a time-dependent coordinate Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , then the mean squared displacement (MSD) for a time interval (time difference) t is defined as
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle M(t) = \frac{1}{N-t} \sum_{j=1}^{N-t} \left( x_{j+t} - x_j \right)^2} ,
which can be written as
Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle M(t) = \frac{1}{N-t} \sum_{j=1}^{N-t} \left( x_j \right)^2 + \frac{1}{N-t} \sum_{j=1}^{N-t} \left( x_{j+t} \right)^2 - \frac{2}{N-t} \sum_{j=1}^{N-t} x_j x_{j+t} } .
This can be computed by AC via the -m option.
Evaluation of the first two terms for requires Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{O}(N)} operations. The third term we recognize as as proportional to the first term (or "cross term") in Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C(t)} . This also implies that we can accelerate calculation of the MSD via FFT (use options -m -f), and evaluate the function for the entire time domain at cost Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N \log N} .
Download binary versions (Linux and OS X)
The current version is 3.96, dated May 2025. It is strongly recommend that you upgrade from any earlier version. You can download binary versions of this program here, but note that this was created for internal lab use - we cannot provide any support.
(binaries coming soon, pending minor improvements)