# Autocorrelation

### Overview

The "autocorrelation" (AC) program is based upon the generic analyzer. It computes the integrated autocorrelation time for time series of data, on a per column basis. It is particularly useful for processing output from Monte Carlo and molecular dynamics simulations. The program is available on all local machines.

The autocorrelation program was originally written in 2002/2003 by Erik Luijten for efficiency comparisons involving the Geometric Cluster Algorithm.

### General usage

autocorrelation [options]filenamestarttimemaxtime

`filename` is a plain text file. Lines starting with a '#' will be ignored (use this feature to insert column descriptions and other information into your simulation data). If `filename` ends with '`.gz`' the file is assumed to be compressed with gzip and will be decompressed on the fly. Note that this happens in memory; no decompressed version of the file is written to disk. This has the advantage that no additional disk space is required and that no additional time is required to compress the data again after the analysis.

To read from standard input instead of a file, specify *STDIN* as the filename.

When redirecting the output, note that the **autocorrelation function** for each column is written to standard output, whereas the **autocorelation time** (and all other information) is written to standard error. See usage notes below.

*starttime* and *maxtime* are mandatory arguments. *starttime* is used to discard initial samples. Normally it is simply set to 0. *maxtime* sets the upper bound for the integral of the autocorrelation function.

### Options

`-a`

Compute the autocorrelation time by integrating the absolute autocorrelation function. This is useful to eliminate cancellations due to anticorelations (which would result in an underestimation of the correlation time), but also will enhance noise in the tail of the autocorrelation function (where the function fluctuates around zero; normally such fluctuations cancel out).`-c n:m`

Compute the cross correlation of columns`n`and`m`. Note: columns are numbered from 0.`-e`

Disable truncating the data set when using FFT-based calculation (which in turn is enabled via the -f option). Normally, use of -f truncates the data set to a multiple of 256 (for more than 10240 samples) or a multiple of 16384 (for more than 10^{6}samples). Note that disabling this truncation carries a significant speed penalty (and is primarily used for debugging purposes). However, it makes the results identical to those obtained without using the FFT. (If the size of the original data set is already a multiple of 256 (or 16384), then the results with and without FFT are identical even without this option.)`-f`

Employ a Fast Fourier Transform to greatly reduce the computational effort, by analyzing the data as a convolution in the frequency domain. This works for the autocorrelation function (Wiener-Khinchin Theorem) as well as for the cross correlation (option`-c`) (Cross-Correlation Theorem) and the mean squared displacement (option`-m`). Note that this requires computing the function over the entire time domain; therefore, for very rapidly decaying functions, where*maxtime*can be kept small, conventional evaluation might be faster.`-m`

Calculate the mean squared displacement (MSD). See algorithm below for the relation between the MSD and the autocorrelation function.`-o`*filename*

Write autocorrelation function to*filename*.`-u`

Do not normalize the autocorrelation function. Without this option, the autocorelation function C(t) is normalized by C(0). Note that this option disables calculation of the autocorrelation time (see algorithm below).`-x`

Only compute the cross term in the autocorrelation function, i.e., the first term in C(t) (see algorithm below).

### Interpreting output

Topics (coming soon):

- choosing maxtime
- anticorrelations
- negative time cross correlations
- ...

### Special usage notes

- Note that, per standard GNU style, options can be combined. Thus, for example, '
`-a -e -f`' can be specified as '`-aef`'. - (more tips coming soon)

### Algorithm

For a discrete time series , the **autocovariance** is given by

,

which can be evaluated for . The **autocorrelation** (or autocorrelation *function*) is the autocovariance normalized by the variance of . If we assume time invariance, this variance is , so that the autocorrelation becomes . (The special case , which arises for a constant time series, is caught by **AC**.) The time difference is also called the *lag time.* By default, the **AC** code computes the autocorrelation; the autocovariance is obtained via the `-u` option.

If the autocorrelation function decays exponentially, , then can be found as

.

This concept is generalized by integrating the autocorrelation function irrespective of its functional form. Thus, the autocorrelation time is computed as

.

Some more nomenclature: Sometimes the autocorrelation refers to the *cross term* (where the brackets indicate averaging over ), which is the first term in . In that case, is referred to as the **autocorrelation coefficient** or autocorrelation coefficient *function.* To compute the cross term in **AC**, specify the `-x` option (or `-u` `-x` to suppress normalization).

For two time series and , the **covariance** is defined as

.

The **cross correlation** is then . **AC** computes cross correlations via the `-c` option. Moreover, as for the autocorrelation, it is possible to suppress normalization via `-u`, and to only compute the first term of the covariance via `-x`.

Finally, if the variable is a time-dependent coordinate , then the **mean squared displacement** (MSD) for a time interval (time difference) *t* is defined as

,

which can be written as

.

This can be computed by **AC** via the `-m` option.

Evaluation of the first two terms for requires operations. The third term we recognize as as proportional to the first term (or "cross term") in . This also implies that we can accelerate calculation of the MSD via FFT (use options `-m` `-f`), and evaluate the function for the entire time domain at cost .

### Download binary versions (Linux and OS X)

The current version is 3.93, dated December 2013. It is strongly recommend that you upgrade from any earlier version. You can download binary versions of this program here, but note that this was created for internal lab use - we cannot provide any support.

(binaries coming soon, pending minor improvements)