# Long Memory Models

## Summary and Keywords

Long memory models are statistical models that describe strong correlation or dependence across time series data. This kind of phenomenon is often referred to as “long memory” or “long-range dependence.” It refers to persisting correlation between distant observations in a time series. For scalar time series observed at equal intervals of time that are covariance stationary, so that the mean, variance, and autocovariances (between observations separated by a lag j) do not vary over time, it typically implies that the autocovariances decay so slowly, as j increases, as not to be absolutely summable. However, it can also refer to certain nonstationary time series, including ones with an autoregressive unit root, that exhibit even stronger correlation at long lags. Evidence of long memory has often been been found in economic and financial time series, where the noted extension to possible nonstationarity can cover many macroeconomic time series, as well as in such fields as astronomy, agriculture, geophysics, and chemistry.

As long memory is now a technically well developed topic, formal definitions are needed. But by way of partial motivation, long memory models can be thought of as complementary to the very well known and widely applied stationary and invertible autoregressive and moving average (ARMA) models, whose autocovariances are not only summable but decay exponentially fast as a function of lag j. Such models are often referred to as “short memory” models, becuse there is negligible correlation across distant time intervals. These models are often combined with the most basic long memory ones, however, because together they offer the ability to describe both short and long memory feartures in many time series.

Keywords: long memory, parametric models, semiparametric models, volatility models, nonstationary models

Introductory Definitions and Discussion

Some basic notation must be introduced. Let ${x}_{t}$, $t=0,\pm 1,\dots ,$ be an equally spaced, real valued time series. We suppose initially that ${x}_{t}$ is covariance stationary, so that the mean

and lag-*j* autocovariances (or variance when *j*=0)

do not depend on $t$. We further suppose that ${x}_{t}$ has a spectral density, denoted

where $\lambda $ denotes “frequency.” Note that $f(\lambda )$ is a non-negative, even function. We might then say that ${x}_{t}$ has “long memory” if

so that $f(\lambda )$ diverges at frequency zero. The extreme alternative that

is, on the other hand, possible; this phenomenon is sometimes referred to as “negative dependence” or “anti-persistence.” The intermediate situation is

when we say that ${x}_{t}$ has “short memory.” It is also possible that $f(\lambda )$ might diverge or be zero at one or more frequencies $\lambda $ in $(0,\pi ]$, possibly indicating seasonal or cyclic behavior. The modeling of such phenomena will be discussed, but the main focus is on behavior at zero frequency, which empirically seems the most interesting. An excellent textbook reference to theory and methods for long memory is Giraitis, Koul, and Surgailis (2012).

Nonparametric estimates of $f(\lambda )$ have been found to be heavily peaked around zero frequency in case of many economic time series, going back to Adelman (1965), lending support for the presence of long memory. Moreover, empirical evidence of long memory in various fields, such as astronomy, chemistry, agriculture, and geophysics, dates from much earlier times; see for example Fairfield Smith (1938) and Hurst (1951).

One feature of interest in early work was behavior of the sample mean,

If $f(\lambda )$ is continuous and positive at $\lambda =0$,

but, for example, Fairfield Smith (1938) fitted a law ${n}^{-\alpha}$, $0<\alpha <1$ to spatial agricultural data, disputing the ${n}^{-1}$ law. At this point it is convenient to switch notation, to $d=(1-\alpha )/2$, because $d$, referred to as the “differencing” parameter, features more commonly in econometric modeling. Fairfield Smith’s (1938) law for the variance of the sample mean is thus ${n}^{2d-1}$, which arises if

for ${c}_{1}>0$. Under additional conditions (see Yong, 1974), (1.1) is equivalent to a corresponding power law for $f(\lambda )$ near zero frequency,

for ${c}_{2}>0$. The behavior of the sample mean under such circumstances, and the form and behavior of the best linear unbiased estimate of the population mean, was discussed by Adenstedt (1974). He anticipated the practical usefulness of (1.2) in the long memory range $0<d<\frac{1}{2}$, but also treated the anti-persistent case $-\frac{1}{2}<d<0$. The sample mean tends to be highly statistically inefficient under anti-persistence, but for long memory Samarov and Taqqu (1988) found it to have remarkably good efficiency.

A number of explanations of how long memory behavior might arise have been proposed. Macroeconomic time series, in particular, can be thought of as aggregating across micro-units. Consider the random-parameter autoregressive model of order 1 ($AR(1)$),

where $\alpha $ indexes micro-units, the ${\epsilon}_{t}(\alpha )$ are independent and homoscedastic with zero mean across $\alpha $, $t$, and $A(\alpha )$ is a random variable with support $(-1,1)$ or $[0,1)$. Then, conditional on $\alpha $, ${X}_{t}(\alpha )$ is a stationary $AR(1)$ sequence. Robinson (1978a) showed that the “unconditional autovariance,” which we again denote by $\gamma (j)$, is given by

and that the “unconditional spectrum” $f(\lambda )$ at $\lambda =0$ is proportional to $E\left\{{(1-A(\alpha ))}^{-2}\right\}$, and thus infinite, if $A(\alpha )$ has a probability density with a zero at 1 of order less than or equal to 1. One class with this property considered by Robinson (1978a) was the (possibly translated) Beta distribution, for which Granger (1980) explicitly derived the corresponding power law behavior of the spectral density of cross-sectional aggregates ${x}_{t}={N}^{-\frac{1}{2}}{\displaystyle {\sum}_{i=1}^{N}}{X}_{t}({\alpha}_{i})$, where the ${\alpha}_{i}$ are independent drawings: clearly $Cov({x}_{t},{x}_{t+j})$ is $\gamma (j)$ due to the independence properties. Indeed, if $A(\alpha )$ has a Beta $(c,2-2d)$ distribution on $(0,1)$, for $c>0$, $0<d<\frac{1}{2}$, $E\left\{A{(\alpha )}^{k}\right\}$ decays like ${k}^{2d-2}$, so (1.3) decays like ${j}^{2d-1}$, as in (1.1). Intuitively, a sufficient density of individuals with close-to-unit-root behavior produces the aggregate long memory. For further developments in relation to more general models see, for example, Lippi and Zaffaroni (1998).

Parametric Models

The differencing parameter, $d$, concisely describes long memory properties, and much interest in the possibility of long memory or anti-persistence focuses on the question of its value. In practice $d$ is typically regarded as unknown, and so its estimation has been the focus of much research. Indeed, an estimate of $d$ is useful even in estimating the variance of the sample mean.

In order to estimate $d$ we need to consider the modeling of dependence in more detail. The simplest possible realistic model for a covariance stationary series is a parametric one that expresses $\gamma (j)$ for all $j$, or $f(\lambda )$ for all $\lambda $, as a parametric function of just two parameters, $d$ and an unknown scale factor. The earliest such model is “fractional noise,” which arises from considerations of self-similarity. A stochastic process $\{y(t)$; $-\infty <t<\infty \}$ is self-similar with “self-similarity parameter” $H\in (0,1)$ if, for any $a>0$, $\{y(at);-\infty <t<\infty \}$ has the same distribution as $\{{a}^{H}y(t);-\infty <t<\infty \}$. If the differences $x=y(t)-y(t-1)$, for integer $t$, are covariance stationary, we obtain

This decays like ${j}^{2H-2}$ as $j\to \infty $, so on taking $H=d+\frac{1}{2}$ we have again the asymptotic law (1.1); $\gamma (0)$ is the unknown scale parameter in this model.

This model was studied by Mandelbrot and Van Ness (1968) and others, but, it extends less naturally to richer stationary series, and nonstationary series, and has an unpleasant spectral form (see the discussion of Whittle estimates), so it has received less attention in recent years than another two-parameter model, the “fractional differencing” model proposed by Adenstedt (1974):

When $d=0$, this is just the spectral density of a white noise series (with variance ${\sigma}^{2}$, while for $0<\left|d\right|<\frac{1}{2}$ both properties (1.1) and (1.2) hold, Adenstedt (1974) giving a formula for $\gamma (j)$ as well as other properties. Note that $d<\frac{1}{2}$ is necessary for integrability of $f(\lambda )$, that is for ${x}_{t}$ to have finite variance; this restriction is sometimes called the stationarity condition on $d$. Another mathematically important restriction is that of invertibility, $d>-\frac{1}{2}$.

The “typical spectral shape of an economic variable” was identified by Granger (1966) as entailing not only spectral divergence at zero frequency, but monotonic decay with frequency. Both “fractional differencing” and “fractional noise” models have this simple property. But even if monotonicity holds, as it may, at least approximately, in case of deseasonalized series, the notion that the entire autocorrelation structure can be explained by a single parameter, $d$, is highly questionable. Though $d$ determines the long-run or low-frequency behavior of $f(\lambda )$, greater flexibility in modeling short-run, high-frequency behavior may be desired. The model (2.1) was referred to as “fractional differencing” because it is the spectral density of ${x}_{t}$ generated by

where $\left\{{e}_{t}\right\}$ is a sequence of uncorrelated variables with zero mean and variance ${\sigma}^{2}$, $L$ is the lag operator, $L{x}_{t}={x}_{t-1}$ and

where $\Gamma (.)$ denotes the gamma function. With $d=1$ (and a suitable initial condition), (2.2) would describe a random walk model. The model

was stressed by Box and Jenkins (1971), $d$ here being a positive integer, $a(L)$ and $b(L)$ being the polynomials

all of whose zeros are outside the unit circle, with $a(L)$ and $b(L)$ having no zero in common to ensure identifiability of the autoregressive (AR) order $p$ and the moving average (MA) order $q$. Granger and Joyeux (1980) considered instead fractional $d\in (-\frac{1}{2},\frac{1}{2})$ in (2.3), giving a fractional autoregressive integrated moving average model of orders $p,d,q$ (often abbreviated as $FARIMA(p,d,q)$ or $ARFIMA(p,d,q)$). It has spectral density

Granger and Joyeux (1980) principally discussed the simple $FARIMA(0,d,0)$ case (2.1) of Adenstedt (1974), but they also considered estimation of $d$, prediction, and simulation of long memory series. Further discussion of $FARIMA(p,d,q)$ models was provided by Hosking (1981), much of it based on Adenstedt’s (1974) model (2.1), but he also gave results for the general case (2.4), especially the $FARIMA(1,d,0)$.

An enduringly popular proposal for estimating $d$, or $H$, used the adjusted rescaled range $(R/S)$ statistic

of Hurst (1951), Mandelbrot and Wallis (1969). Large sample statistical properties of the R/S statistic were studied by Mandelbrot and Taqqu (1979), and Taqqu (1975), and it was considered in an economic context by Bloomfield (1972). But its limit distribution is nonstandard and difficult to use in statistical inference, while it has no known optimal efficiency properties with respect to any known family of distributions.

Despite the distinctive features of long memory series, there is no overriding reason why traditional approaches to parametric estimation in time series should be abandoned in favor of rather special approaches like $R/S$. In fact, if ${x}_{t}$ is assumed Gaussian, the Gaussian maximum likelihood estimate (MLE) might be expected to have optimal asymptotic statistical properties, and unlile $R/S$, can be tailored to the particular parametric model assumed.

The literature on the Gaussian MLE developed first with short memory processes in mind (see, e.g., Whittle, 1951; Hannan, 1973). One important finding was that the Gaussian likelihood can be replaced by various approximations without affecting first order limit distributional behavior. Under suitable conditions, estimates maximizing such approximations, called “Whittle estimates” are all $\sqrt{n}$-consistent and have the same limit normal distribution as the Gaussian MLE.

One particular Whittle estimate that seems particularly computationally advantageous is the discrete-frequency form. Suppose the parametric spectral density has form $f(\lambda ;\theta ,{\sigma}^{2})=({\sigma}^{2}/2\pi )h(\lambda ;\theta )$, where $\theta $ is an $r$-dimensional unknown parameter vector and ${\sigma}^{2}$ is a scalar as in (2.1). If ${\sigma}^{2}$ is regarded as varying freely from $\theta $, and ${\int}_{-\pi}^{\pi}}\mathrm{log}h(\lambda ;\theta )d\lambda =0$ for all admissible values of $\theta $, then we have what might be called a “standard parameterization.” For example, we have a standard parameterization in (2.1) with $\theta =d$, and in (2.4) with $\theta $ determining the ${a}_{j}$, $1\le j\le p$ and ${b}_{j}$, $1\le j\le q$. Define also the periodogram

and the Fourier frequencies ${\lambda}_{j}=2\pi jn$. Denoting by ${\theta}_{0}$ the true value of $\theta $, then the discrete frequency Whittle estimate of ${\theta}_{0}$ minimizes (2.5) to a constant minus the Gaussian log likelihood,

Hannan (1973) stressed this estimate. It has the advantages of using directly the form of $h$, which is readily written down in case of autoregressive moving average (ARMA) models, Bloomfield’s (1972) spectral model, and others; on the other hand, autocovariances, partial autocovariances, AR coefficients, and MA coefficients, which variously occur in other types of Whittle estimates, tend to be more complicated except in special cases; indeed, for (2.4) the form of autocovariances, for example, can depend on the question of multiplicity of zeros of $a(L)$. Another advantage of (2.5) is that it makes direct use of the fast Fourier transform, which enables the periodograms $I({\lambda}_{j})$ to be rapidly computed even when $n$ is very large. A third advantage is that mean-correction of ${x}_{t}$ is dealt with simply by omission of the frequency ${\lambda}_{0}=0$.

A notable feature of Whittle estimates of ${\theta}_{0}$, first established in case of short memory series, is that while they are only asymptotically efficient when ${x}_{t}$ is Gaussian, their limit distribution (in case of “standard parameterizations”) is unchanged by many departures from Gaussianity. Thus the same, relatively convenient, statistical methods (hypothesis testing, interval estimation) can be used without worrying too much about the question of Gaussianity. Hannan established asymptotic statistical properties for several Whittle forms in case ${x}_{t}$ has a linear representation in homoscedastic stationary martingale differences having finite variance.

It is worth noting that Hannan (1973) established first consistency under only ergodicity of ${x}_{t}$, so that long memory was actually included here. However, for his asymptotic normality result, with $\sqrt{n}$-convergence, which is crucial for developing statistical inference, his conditions excluded long memory, and clearly (2.5) appears easier to handle technically in the presence of a smooth $h$ than of one with a singularity. Robinson (1978b) developed extensions to cover “nonstandard parameterizations,” his treatment hinting at how a modest degree of long memory might be covered. He reduced the problem to a central limit theorem for finitely many sample autocovariances, whose asymptotic normality had been shown by Hannan (1976) to rest crucially on square integrability of the spectral density; note that (2.1) and (2.4) are square integrable only for $d<\frac{1}{4}$. In fact for some forms of Whittle estimate, Yajima (1985) established the central limit theorem, again with $\sqrt{n}$-rate, in case of model (2.1) with $0<d<\frac{1}{4}$.

Fox and Taqqu (1986) provided the major breakthrough in justifying Whittle estimation in long memory models. Their objective function was not (2.5) but the continuous frequency form

but their basic insight applies to (2.5) also. Because the periodogram $I(\lambda )$ is an asymptotically unbiased estimate of the spectral density only at continuity points it can be expected to “blow up” as $\lambda \to 0$. However, because $h(\lambda ;\theta )$ also “blows up” as $\lambda \to 0$ and appears in the denominator, some “compensation” can be expected. Actually, limiting distributional behavior depends on the “score” (the derivative in $\theta $ of (2.6) or (2.5)) at ${\theta}_{0}$ being asymptotically normal; Fox and Taqqu (1987) gave general conditions for such quadratic forms to be asymptotically normal, which then apply to Whittle estimates with long memory such that $0<d<\frac{1}{2}$.

Gaussianity of ${x}_{t}$ was assumed by Fox and Taqqu (1986), and by Dahlhaus (1989), who considered the actual Gaussian MLE and discrete-frequency Whittle estimate, and established asymptotic efficiency. For (2.6) Giraitis and Surgailis (1990) relaxed Gaussianity to a linear process in independent and identically distributed (iid) innovations, thus providing a partial extension of Hannan’s (1973) work to long memory. The bulk of this asymptotic theory has not directly concerned the discrete frequency form (2.5), and has focused mainly on the continuous frequency form (2.6), though the former benefits from the neat form of the spectral density in case of the popular $FARIMA(p,d,q)$ class (2.4); on evaluating the integral in (2.6), we have a quadratic form involving the Fourier coefficients of $h{(\lambda ;\theta )}^{-1}$, which are generally rather complicated for long memory models. Also, in (2.6) and the Gaussian MLE, correction for an unknown mean must be explicitly carried out, not dealt with merely by dropping zero frequency.

Other estimates have been considered. While Whittle estimation of the models (2.2) and (2.3) requires numerical optimization, Kashyap and Eom (1988) proposed a closed-form estimate of $d$ in (2.2) by a log periodogram regression (across ${\lambda}_{j}$, $j=1,\dots ,n-1$). This idea does not extend nicely to $FARIMA(p,d,q)$ models (2.3) with $p>0$ or $q>0$, but it does to

(see Robinson, 1994a), which combines (2.2) with Bloomfeld’s (1972) short memory exponential model; Moulines and Soulier (1999) provided asymptotic theory for log peridogram regression estimation of (2.7). They assumed Gaussianity, which for technical reasons is harder to avoid when a nonlinear function of the periodogram, such as the log, is involved, than in Whittle estimation, despite this being originally motivated by Gaussianity. Whittle estimation is also feasible with (2.7); indeed Robinson (1994a) noted that it can be reparameterized as

taking ${\theta}_{1}={\beta}_{1}$, ${\theta}_{k}={\beta}_{k}-2/(k-1)$, $2\le k\le p-1$, from which it can be deduced that the limiting covariance matrix of Whittle estimates is desirably diagonal.

In econometrics generalized method of moments (GMM) has beeen proposed for estimating many models, including long memory models. But GMM objective functions seem in general to be less computationally attractive than (2.5), require stronger regularity conditions in asymptotic theory, and do not deal so nicely with an unknown mean. Also, unless a suitable weighting is employed they will be less efficient than Whittle estimates in the Gaussian case, have a relatively cumbersome limiting covariance matrix, and are not even asymptotically normal under $d>\frac{1}{4}$. But note that $\sqrt{n}$-consistency and asymptotic normality of Whittle estimates cannot even be taken for granted, having been shown not to hold over some or all of the range $d\in (0,\frac{1}{2})$ for certain nonlinear functions ${x}_{t}$ of a underlying Gaussian long memory process (see, e.g., Giraitis & Taqqu, 1999).

Even assuming Gaussianity of ${x}_{t}$, nonstandard limit distributional behavior for Whittle estimates can arise in certain models. As observed, a spectral pole (or zero) could arise at a non-zero frequency, to explain a form of cyclic behavior. Gray, Zhang, and Woodward (1989) proposed the “Gegenbauer” model

for $\omega \in (0,\pi ]$. To compare with (2.1), $f(\lambda )$ diverges at frequency $\omega $ if $d>0$. When $\omega $ is known, the previous discussion of estimation and asymptotic theory applies. If $\omega $ is unknown, then Whittle procedures can be adapted, but it seems that such estimates of $\omega $ (but not of the other parameters) will be $n$-consistent with a nonstandard limit distribution. Giraitis, Hidalgo, and Robinson (2001) established $n$-consistency for an estimate of $\omega $ that, after being suitably standardized, cannot converge in distribution.

Semiparametric Models

“Semiparametric” models for long memory retain the differencing parameter $d$ but treat the short memory component nonparametrically. Correct specification of $p$ and $q$ is very important in parametric fractional autoregressive integrated moving average $FARIMA(p,d,q)$ models. In particular, under-specification of $p$ or $q$ leads to inconsistent estimation of autoregressive (AR) and moving average (MA) coefficients, but also of $d$, as does over-specification of both, due to a loss of identifiability. Procedures of order-determination developed for short memory models, such as Akaike’s information criterion (AIC), have been adapted to FARIMA models, but there is no guarantee that the underlying model belongs to the finite-parameter class proposed. That an attempt to seriously model short-run features can lead to inconsistent estimation of long-run properties seems very unfortunate, especially if the latter happen to be the aspect of most interest.

Short-run modelling is seen from (1.1) and (1.2) to be almost irrelevant at very low frequencies and very long lags, where $d$ dominates. This suggests that estimates of $d$ can be based on information arising from only one or the other of these domains, and that such estimates should have validity across a wide range of short memory behavior. Because this robustness requires estimates to essentially be based on only a vanishingly small fraction of the data as sample size increases, one expects slower rates of convergence than for estimates based on a correct finite-parameter model. But in very long series, such as arise in finance, the degrees of freedom available may be sufficient to provide adequate precision. These estimates are usually referred to as “semiparametric,” though their slow convergence rates make them more akin to “nonparametric” estimates in other areas of statistics; indeed, some are closely related to the smoothed nonparametric spectrum estimates familiar from short memory time series analysis.

It is worth stressing that not just point estimation of $d$ is of interest, but also interval estimation and hypothesis testing. Probably the test of most interest to practitioners is a test of long memory, or rather, a test of short memory $d=0$ against long memory alternatives $d>0$, or anti-persistent alternatives $d<0$, or both, $d\ne 0$. For this we need a statistic with a distribution that can be satisfactorily approximated, and computed, under $d=0$, and that has good power. In a parametric setting, tests of $d=0$—perhaps of Wald, Lagrange multiplier or likelihood-ratio type—can be based on Whittle functions such as (2.5) and the $FARIMA(p,d,q)$ family. Actually, much of the limit distribution theory for Whittle estimation primarily concerned with stationary long memory, $0<d<\frac{1}{2}$, does not cover $d=0$, or $d<0$, but other earlier short memory theory, such as Hannan’s (1973), can provide null limit theory for testing $d=0$. Because the test statistic is based on assumed $p$ and $q$, the null limit distribution developed on this basis is generally invalid if $p$ and $q$ are misspecified, as discussed earlier; this can lead, for example, to mistaking unnaccounted-for short memory behavior for long memory, and rejecting the null too often. The invalidity of tests for $d=0$ for the R/S statistic introduced previously in the presence of unanticipated short memory autocorrelation was observed by Lo (1991), who proposed a corrected statistic (using smoothed nonparametric spectral estimation at frequency zero) and developed its limit distribution under $d=0$ in the presence of a wide range of short memory dependence (described by mixing conditions), and tested stock returns for long memory.

The null limit theory of Lo’s (1991) modified R/S statistic is nonstandard. Any number of possible statistics has sensitivity to long memory. Of these, some have the character of “method-of-moments” estimates, minimizing a “distance” between population and sample properties. Robinson (1994b) proposed an “averaged periodogram” estimate of $d$, employing what would be a consistent estimate of $f(0)$ under $d=0$, establishing consistency under finiteness of only second moments and allowing for the presence of an unknown slowly varying factor $L(\lambda )$ in $f(\lambda )$, so that (1.2) is relaxed to

In this setting, Delgado and Robinson (1996) proposed data-dependent choices of the bandwidth number (analogous to the one discussed later in relation to log periodogram estimation, for example) that is required in the estimation, and Lobato and Robinson (1996) established limit distribution theory, which is complicated: the estimate is asymptotically normal for $0\le d<\frac{1}{4}$, but non-normal for $d\ge \frac{1}{4}$. Various other semiparametric estimates of $d$ share this latter property, which is due to $f(\lambda )$ not being square-integrable for $d\ge \frac{1}{4}$.

The traditional statistical practice of regression turns out to be fruitful. The asymptotic law (1.1) suggests two approaches, nonlinearly regressing sample autocovariances on $c{j}^{2d-1}$, and ordinary linear regression (OLS) of logged sample autocovariances on $\mathrm{log}j$ and an intercept, as proposed by Robinson (1994a). But the limit distributional properties of these estimates are as complicated as those for the averaged periodogram estimate, intuitively because OLS is a very ad hoc procedure in this setting, the implied “disturbances” in the “regression model” being far from uncorrelated or homoscedastic.

We can expect OLS to yield nice results only if the disturbances are suitably “whitened.” In case at least of short memory series the (Toeplitz) covariance matrix of ${x}_{1},\dots ,{x}_{n}$ is approximately diagonalized by a unitary transformation, such that normalized periodograms ${u}_{j}=\mathrm{log}\left\{I({\lambda}_{j})/f({\lambda}_{j})\right\}$ (cf. (2.4)), sufficiently resemble a zero-mean, uncorrelated, homoscedastic sequence. In case of long memory series, (1.2) suggests consideration of

for a positive constant $c$ and ${\lambda}_{j}$ close to zero, as pursued by Geweke and Porter-Hudak (1983), though they instead employed a narrow band version of the “fractional differencing” model (2.1), specifically replacing $\mathrm{log}{\lambda}_{j}$ by $\mathrm{log}\left|1-{e}^{i{\lambda}_{j}}\right|$. They carried out OLS regression over $j=1,\dots ,m$, where $m$, a bandwidth or smoothing number, is much less than $n$ but is regarded as increasing slowly with $n$ in asymptotic theory. (Geweke and Porter-Hudak’s [1983] approach was anticipated by a remark of Granger and Joyeux [1980]). Geweke and Porter-Hudak argued, in effect, that as $n\to \infty $ their estimate $\tilde{d}$ satisfies

giving rise to extremely simple inferential procedures. But the heuristics underlying their argument are defective, and they, and some subsequent authors, did not come close to providing a rigorous proof of (3.2). One problem with their heuristics is that for long memory (and anti-persistent) series the ${u}_{j}$ are not actually asymptotically uncorrelated or homoscedastic for fixed $j$ with $n\to \infty $, as shown by Künsch (1986), and elaborated upon by Hurvich and Beltrao (1993) and Robinson (1995a). Robinson (1995a) showed that this in itself invalidates Geweke and Porter-Hudak’s (1983) argument. Even for $j$ increasing with $n$, the approximation of the ${u}_{j}$ by an uncorrelated, homoscedastic sequence is not very good, and this, and the nonlinearly involved periodogram, makes a proof of (3.2) non-trivial.

In Robinson (1995a), (3.2) was established, explicitly in case of the approximation (3.1) rather than Geweke and Porter-Hudak’s version, though indicating that the same result holds there. His result applies to the range $\left|d\right|<\frac{1}{2}$, providing simple interval estimates as well as a simple test of short memory, $d=0$. Robinson (1995a) assumed Gaussianity, but Velasco (2000) gave an extension to linear processes ${x}_{t}$, both authors employing Künsch’s (1986) suggestion of trimming out the lowest ${\lambda}_{j}$ to avoid the anomalous behavior of periodograms there, but Hurvich, Deo, and Brodsky (1998) showed that this was unnecessary for (3.2) to hold, under suitable conditions. These authors also addressed the issue of choice of the bandwidth, $m$, providing optimal asymptotic minimum mean-squared error theory. If $f(\lambda ){\lambda}^{2d}$ is twice differentiable at $\lambda =0$, the optimal bandwidth is of order ${n}^{4/5}$, but the multiplying constant depends on unknown population quantities. A consistent estimate of this constant was proposed by Hurvich and Deo (1999), and hence a feasible, data-dependent choice of $m$. Hurvich and Beltrao (1994) had related mean squared error to integrated mean squared error in spectral density estimation, and thence proposed cross-validation procedures for choosing both $m$ and the trimming constant. The “log-periodogram estimates” just discussed have been greatly used empirically, deservedly so in view of their nice asymptotic properties and strong intuitive appeal. But in view of the limited information it employs there is a concern about precision, and it is worth asking at least whether the information can be used more efficiently. In fact Robinson (1995a) showed that indeed the asymptotic variance in (3.2) can be reduced by “pooling” adjacent periodograms, prior to logging.

A proposal of Künsch (1987), however, leads to an alternative frequency-domain estimate that does even better. He suggested a narrow-band discrete-frequency Whittle estimate (cf. (2.5)). This essentially involves Whittle estimation of the “model” $f(\lambda )=C{\lambda}^{-2d}$ over frequencies $\lambda ={\lambda}_{j}$, $j=1,\dots ,m$, where $m$ plays a similar role as in log periodogram estimation. After that, $C$ can be eliminated by a side calculation (much as the innovation variance is eliminated in getting (2.5)), and $d$ is estimated by $\widehat{d}$, which minimizes

There is no closed-form solution to (3.3), but it is easy to handle numerically. Robinson (1995b) established that

For the same $m$ sequence, $\widehat{d}$ is then more efficient than the log periodogram estimate $\tilde{d}$ (cf. (3.2)), while the pooled log periodogram estimate of Robinson (1995a) has asymptotic variance that converges to $\frac{1}{4}$ from above as the degree of pooling increases. While $\widehat{d}$ is only implicitly defined, it is nevertheless easy to locate, and the linear involvement of the periodogram in (3.3) makes it possible to establish (3.4) under simpler and milder conditions than needed for (3.2), Robinson employing a linear process for ${x}_{t}$ in martingale difference innovations. This, and the coverage of all $d\in (-\frac{1}{2},\frac{1}{2})$, may have implications also for further development of the asymptotic theory of parametric Whittle estimates discussed earlier. An additional feature of the asymptotic theory of Robinson (1995a), and that of Robinson (1995b), is the purely local nature of the assumptions on $f(\lambda )$ and the way in which the theory fits in with earlier work on smoothed nonparametric spectral estimation for short memory series; (1.2) is refined to

where $\beta \in (0,2]$ is analogous to the local smoothness parameter involved in the spectral estimation work, and no smoothness, or even boundedness, is imposed on $f$ away from zero frequency. Note that the parameter $\beta $ also enters into rules for optimal choice of $m$; see Henry and Robinson (1996). Lobato and Robinson (1998) provided a Lagrange multiplier test of the short memory hypothesis $d=0$ based on (3.3) that avoids estimation of $d$.

Various refinements to the semiparametric estimates $\tilde{d}$ and $\widehat{d}$, and their asymptotic theory, have been developed. Hurvich and Beltrao (1994) and Hurvich and Deo (1999) have proposed bias-reduced estimates, while Andrews and Guggenberger (2003) and Robinson and Henry (2003) have developed estimates that can further reduce the bias, and have smaller asymptotic minimum mean squared error, using, respectively, an extended regression and higher-order kernels, Robinson and Henry (2003) at the same time introducing a unified $M$-estimate class that includes $\tilde{d}$ and $\widehat{d}$ as special cases. Giraitis and Robinson’s (2003) development of an Edgeworth expansion for a modified version of $\widehat{d}$ also leads to bias reduction, and a rule for bandwidth choice. An alternative refinement of $\widehat{d}$ was developed by Andrews and Sun (2004). Additionally, Moulines and Soulier (1999, 2000) and Hurvich and Brodsky (2001) considered a broadband version of $\tilde{d}$ originally proposed by Janacek (1982), effectively extending the regression in (3.1) over all Fourier frequencies after including cosinusoidal terms, corresponding to the model (2.7) with $p$, now a bandwidth number, increasing slowly with $n$. These authors showed that if $f(\lambda ){\lambda}^{2d}$ is analytic over all frequencies, an asymptotic mean squared error of order $\mathrm{log}n/n$ can thereby be obtained, which is not achievable by the refinements to $\tilde{d}$ and $\widehat{d}$ discussed, though the latter require only local-to-zero assumptions on $f(\lambda )$.

Volatility Models

For financial time series, “long memory” has been found not so much in raw time series ${x}_{t}$ as in nonlinear instantaneous functions such as their squares, ${x}_{t}^{2}$. Thus, whereas we have so far presented long memory as purely a second-order property of a time series, referring to autocovariances or spectral structure, these do not completely describe non-Gaussian processes, where “memory” might usefully take on a rather different meaning. Passing a process through a nonlinear filter can change asymptotic autocovariance structure, and as Rosenblatt (1961) showed, if ${x}_{t}$ is a stationary long memory Gaussian process satisfying (1.1), then ${x}_{t}^{2}$ has autocovariance decaying like ${j}^{4d-2}$, so has “long memory” only when $\frac{1}{4}\le d<\frac{1}{2}$, and even here, because $4d-2<2d-1$, ${x}_{t}^{2}$ has “less memory” than ${x}_{t}$.

Financial time series frequently suggest a reverse kind of behavior. In particular, asset returns, or logged asset returns, may exhibit little autocorrelation, as is consistent with the efficient markets hypothesis, whereas their squares are noticeably correlated. Whereas our previous focus on second order moments led to linear time series models, we must now consider nonlinear ones. There is any number of possibilities, but Engle (1982) proposed to model this phenomenon by the autoregressive conditionally heteroscedastic model of order $p$ ($ARCH(p)$), such that

where

with ${\alpha}_{0}>0$, ${\alpha}_{j}\ge 0$, $1\le j\le p$, and ${\epsilon}_{t}$ is a sequence of independent and identically distributed (iid) random variables (possibly Gaussian). Under suitable conditions on the ${\alpha}_{j}$, it follows that the ${x}_{t}$ are martingale differences (and thus uncorrelated), whereas the ${x}_{t}^{2}$ have an $AR(p)$ representation, in terms of martingale difference (but not conditionally homoscedastic) innovations. The model was extended by Bollerslev (1986) to the generalized autoregressive conditionally heteroscedastic model of index $p$, $q$ ($GARCH(p,q)$), which implies that the ${x}_{t}^{2}$ have an $ARMA\left(\mathrm{max}(p,q),q\right)$ representation in a similar sense.

The ARCH and GARCH models have found considerable use in finance. But they imply that the autocorrelations of the squares ${x}_{t}^{2}$ either eventually cut off completely or decay exponentially, whereas empirical evidence of slower decay perhaps consistent with long memory has accumulated; see, for example, Whistler (1990) and Ding, Granger, and Engle (1993). Robinson (1991) had already suggested ARCH-type models capable of explaining greater autocorrelation in squares, so that (4.1) is extended to

or replaced by

In case of both models, and related situations, Robinson (1991) developed Lagrange multiplier or score tests of “no-ARCH” (which is consistent with ${\alpha}_{j}=0$, $j\ge 1$) against general parameterizations in (4.2) and (4.3); such tests should be better at detecting autocorrelation in ${x}_{t}^{2}$ that falls off more slowly than ones based on the $ARCH(p)$, (4.2), say.

We can formally rewrite (4.2) as

where the ${\nu}_{t}={x}_{t}^{2}-{\sigma}_{t}^{2}$ are martingale differences. Robinson (1991) suggested the possibility of using for ${\alpha}_{j}$ in (4.4) the AR weights from the $FARIMA(0,d,0)$ model (see (2.1)), taking ${\alpha}_{0}=0$, and Whistler (1990) applied this version of his test to test $d=0$ in exchange rate series. This $FARIMA(0,d,0)$ case was further considered by Ding and Granger (1996), along with other possibilities, but sufficient conditions of Giraitis, Kokoszka, and Leipus (2000) for existence of a covariance stationary solution of (4.4) rule out long memory, though they do permit strong autocorrelation in ${x}_{t}^{2}$ that very closely approaches it, and Giraitis and Robinson (2001) have established asymptotic properties of Whittle estimates based on squares for this model. For $FARIMA(p,d,q)$ AR weights ${\alpha}_{j}$ in (4.2), ${x}_{t}^{2}$ is not covariance stationary when $d>0$, ${\alpha}_{0}>0$, and Baillie, Bollerslev, and Mikkelsen (1996) called this FIGARCH, a model that has since been widely applied in finance.

For model (4.3), Giraitis, Robinson, and Surgailis (2000) have shown that if the weights ${\alpha}_{j}$ decay like ${j}^{d-1}$, $0<d<\frac{1}{2}$, then any integral power ${x}_{t}^{k}$, such as the square, has long memory autocorrelation, satisfying (1.1) irrespective of $k$. This model also has the advantage over (4.2) of avoiding the non-negativity constraints on the ${\alpha}_{j}$, and an ability to explain leverage.

An alternative approach to modeling autocorrelation in squares, and other nonlinear functions, alongside possible lack of autocorrelation in ${x}_{t}$, expresses ${\sigma}_{t}^{2}$ directly in terms of past ${\epsilon}_{t}$, rather than past ${x}_{t}$, leading to a nonlinear MA form. Nelson (1991) proposed the exponential GARCH (EGARCH) model, where we take

$g$ being a user-chosen nonlinear function; for example, Nelson stressed $g(z)=\theta z+\gamma (\left|z\right|-E\left|z\right|)$, which is useful in describing a leverage effect. Nelson (1991) noted the potential for choosing the ${\alpha}_{j}$ to imply long memory in ${\sigma}_{t}^{2}$, but stressed short memory, ARMA, weights ${\alpha}_{j}$. On the other hand, Robinson and Zaffaroni (1997) proposed nonlinear MA models, such as

where the ${\epsilon}_{t}$ are an iid sequence. They showed the ability to choose the ${\alpha}_{j}$ such that ${x}_{t}^{2}$ has long memory autocorrelation, and proposed use of Whittle estimation based on the ${x}_{t}^{2}$.

Another model, closely related to (4.5), proposed by Robinson and Zaffaroni (1998), replaces the first factor ${\epsilon}_{t}$ by ${\eta}_{t}$, where the ${\eta}_{t}$ are iid and independent of the ${\epsilon}_{t}$, and again long memory potential was shown. This model is a special case of

of which the short memory stochastic volatility model of Taylor (1986) is also a special case. Long memory versions of Taylor’s model were studied by Breidt, Crato, and de Lima (1998), choosing

the ${\alpha}_{j}$ being MA weights in the $FARIMA(p,d,q)$. They considered Whittle estimation based on squares, discussing its consistency, and applying the model to stock price data.

Asymptotic theory for ML estimates of models such as (4.5), (4.6), and (4.7) is considerably more difficult to derive; indeed, it is hard to write down the likelihood, given, say, Gaussian assumptions on ${\epsilon}_{t}$ and ${\eta}_{t}$. In order to ease mathematical tractability in view of the nonlinearity in (4.7), Gaussianity of ${\epsilon}_{t}$ was stressed by Breidt, Crato, and de Lima (1998). In that case, we can write the exponent of $h$ in (4.7) as ${\alpha}_{0}+{z}_{t}$, where z${z}_{t}$ is a stationary Gaussian, possibly long memory, process, and likewise the second factor in (4.5). Such models are all covered by modeling ${x}_{t}$ as a general nonlinear function of a vector unobservable Gaussian process ${\xi}_{t}$. Starting from an asymptotic expansion for the covariance of functions of multivariate normal vectors, Robinson (2001) indicated how long memory in nonlinear functions of ${x}_{t}$ depends on the long memory in ${\xi}_{t}$ and the nature of the nonlinearity involved, with application also to cyclic behavior, cross-sectional and temporal aggregation, and multivariate models. Allowance for quite general nonlinearity means that relatively little generality is lost by the Gaussianity assumption on ${\xi}_{t}$, while the scope for studying autocorrelation structure of functions such as $\left|{x}_{t}\right|$ can avoid the assumption of a finite fourth moment in ${x}_{t}$, which has been controversial.

Semiparametric models and methods for long memory in volatility have also been considered. In particular, Hurvich, Moulines, and Soulier (2005) investigated properties of the narrow-band discrete-frequency Whittle estimate based on the $\mathrm{log}{x}_{t}^{2}$ series.

Nonstationary Models

In time series econometrics, unit root models have been a major focus since the late 1980s. Previously to this, modeling of economic time series typically involved a combination of short memory, $I(0)$, series and ones that are nonstochastic, either in the sense of sequences such as dummy variables or polynomial time trends, or of conditioning on predetermined economic variables. On the other hand, unit root modeling starts from the random walk model, that is, (2.2) for $t\ge 1$ with $d=1,{e}_{t}$ white noise and ${x}_{0}=0$, and then generalizes ${e}_{t}$ to be a more general $I(0)$ process, modeled either parametrically or nonparametrically; ${x}_{t}$ is then said to be an $I(1)$ process. Such models, often with the involvement also of nonstationary time trends, have been successfully used in macroeconometrics, frequently in connection with cointegration analysis.

One essential preliminary step is the testing of the unit root hypothesis. Numerous such tests have been proposed, often directed against $I(0)$ alternatives, and using classical Wald, Lagrange multiple, and likelihood-ratio procedures, see, for example, Dickey and Fuller (1979). In classical situations, these lead to a null ${\chi}^{2}$ limit distribution, a non-central local ${\chi}^{2}$ limit distribution, Pitman efficiency, and a considerable degree of scope for robustness to the precise implementation of the test statistics, for example to the estimate of the asymptotic variance matrix that is employed. The unit root tests against $I(0)$ alternatives lose such properties; for example, the null limit distribution is nonstandard. This nonstandard behavior arises essentially because the unit root is nested unsmoothly in an AR system: in the $AR(1)$ case, the process is stationary with exponentially decaying autocovariance structure when the AR coefficient $\alpha $ lies between -1 and 1, has unit root nonstationarity at $\alpha =1$, and is “explosive” for $\left|\alpha \right|>1$. The tests directed against AR alternatives seem not to have very good powers against fractional alternatives, as Monte Carlo investigation of Diebold and Rudebusch (1991) suggests.

Any number of models can potentially nest a unit root, and the fractional class turns out to have the “smooth” properties that lead classically to the standard, optimal asymptotic behavior referred to earlier. Robinson (1994c) considered the model

where ${u}_{t}$ is an $I(0)$ process with parametric autocorrelation and

where the ${\omega}_{j}$ are given distinct real numbers in $(0,\pi )$, and the ${d}_{j}$, $1\le j\le h$, are arbitrary real numbers. The initial condition in (5.1) avoids an unbounded variance, the main interest being in nonstationary ${x}_{t}$. Robinson (1994c) proposed tests for specified values of the ${d}_{j}$ against, fractional, alternatives in the class (5.2). For example, in the simplest case the unit root hypothesis $d=1$ can be tested, but against fractional alternatives ${(1-L)}^{d}$ for $d>1$, $d<1$ or $d\ne 1$. Some other null $d$ may be of interest, for example, $d=\frac{1}{2}$, this being the boundary between stationarity and nonstationarity in the fractional domain. The region $d\in [\frac{1}{2},1)$ has been referred to as mean-reverting, MA coefficients of ${x}_{t}$ decaying, albeit more slowly than under stationary, $d<\frac{1}{2}$. Note that the models (5.1) and (5.2) also cover seasonal and cyclical components (cf. the Gegenbauer model (2.8)) as well as stationary and overdifferenced ones. Robinson (1994c) showed that his Lagrange multiplier tests enjoy the classical large-sample properties of such tests.

To intuitively explain this outcome, note that unlike in unit root tests against AR alternatives, the test statistics are based on the null differenced ${x}_{t}$, which are $I(0)$ under the null hypothesis. This would suggest that estimates of memory parameters ${d}_{j}$ in (5.1) and (5.2) and of parameters describing ${u}_{t}$, such as Whittle estimates, will also continue to possess the kind of standard asymptotic properties—$\sqrt{n}$-consistency and asymptotic normality—under nonstationarity as we have encountered in stationary circumstances. Beran (1995), in case $\phi (L)=(1-L{)}^{d}$ and ${u}_{t}$ white noise, indicated this, though the initial consistency proof he provides, an essential preliminary to asymptotic distribution theory for his implicitly defined estimate, appears to assume that the estimate lies in a neighborhood of the true $d$, itself a consequence of consistency; over a suitably wide range of $d$-values, the objective function does not converge uniformly.

Velasco and Robinson (2000) adopted a somewhat different approach, employing instead the model

where $s$ denotes the integer part of $d+\frac{1}{2}$ and ${u}_{t}$ is a parametric $I(0)$ process such as white noise;${v}_{t}$ is a stationary $I(d-s)$ process, invertible also unless $d=-\frac{1}{2}$. The distinction between the two definitions of nonstationary $I(d)$ processes, in (5.1) on the one hand and (5.3) and (5.4) on the other, was discussed by Marinucci and Robinson (1999); this entails, for example, convergence to different forms of fractional Brownian motion. Velasco and Robinson (2000) considered a version of discrete-frequency Whittle estimation (cf. (2.5)), but nonstationarity tends to bias periodogram ordinates, and to sufficiently reduce this they in general (for $d\ge \frac{3}{4}$ and with (5.4) modified so that ${v}_{t}$ has an unknown mean) found it necessary to suitably “taper” the data, and then, in order to overcome the undesirable dependence this produces between neighboring periodograms, to use only Fourier frequencies ${\lambda}_{j}$, such that $j$ is a multiple of $p$: $p$ is the “order” of the taper, such that $p\ge \left[d+\frac{1}{2}\right]+1$ is required for asymptotic normality of the estimates, with $\sqrt{n}$ rate of convergence; because $d$ is unknown a large $p$ can be chosen for safety’s sake, but the asymptotic variance is inflated by a factor varying directly with $p$. The theory is invariant to an additive polynomial trend of degree up to $p$.

For parametric versions of the form (5.1), such as the $FARIMA(p,d,q)$, Hualde and Robinson (2011) considered the conditional sum of squares type of estimate used by Box and Jenkins (1971) in an $ARMA$ context. This particular form, rather than other versions of Whittle estimate, turns out to be crucial in establishing asymptotic statistical properties without resorting to trimming, indeed as a result asymptotic effciency is achieved. A notable feature of Hualde and Robinson (2011) is that it is not necessary to know in advance whether $d$ lies in the stationary, nonstationary, anti-persistent, or non-invertible regions.

In case of semiparametric models, tapering has played a larger role in covering nonstationarity. A similar model to (5.3) and (5.4), originated in Hurvich and Ray (1995), who required $s=1$ in (5.3), while in (5.4) they allowed $-\infty <d<\frac{1}{2}$, so that (unlike Beran, 1995and Velasco & Robinson, 2000) they covered nonstationarity only up to $d<3/2$ (though this probably fits many applications), on the other hand covered any degree of noninvertibility. Hurvich and Ray’s (1995) concern, however, was not with asymptotic theory for parameter estimates, rather they found that asymptotic bias in $I({\lambda}_{j})$, for fixed $j$ as $n\to \infty $, could be notably reduced by use of a cosine bell taper, leading them to recommend use of tapering (and omission of frequency ${\lambda}_{1}$) in the log periodogram estimation of $d$ discussed earlier, in case nonstationarity is feared. Limit distribution theory was established by Velasco (1999a), analogous to that described for log periodogram estimates in case $d\ge \frac{1}{2}$, in a semiparametric version of (5.3) and (5.4) (so ${u}_{t}$ has nonparametric autocorrelation), using a general class of tapers. Further, Velasco (1999b) established analogous results for local Whittle estimates (cf. (3.4)). Once again, there is invariance to polynomial trends, but tapering, imposed for asymptotic normality when $d\ge \frac{3}{4}$ and for consistency when $d>1$, entails skipping frequencies and/or an efficiency loss. However, Hurvich and Chen (2000) proposed a taper, applied to first differences when $d<3/2$, that, with no skipping, loses less efficiency. Shimotsu and Phillips (2005) considered an “exact” form of local Whittle function, without tapering, based on (5.1), establishing asymptotic properties when the optimization covers an interval of width less than $9/2$. However, tapering can be a wise precaution when nonstationarity is believed possible, and can be useful in theoretical refinements even under stationarity, see, for example, Giraitis and Robinson (2003).

Final Comments

Various ways of modeling long memory in economic and financial time series have been introduced, along with methods for estimating them, theoretical results that are useful in justifying the estimates, and methods for drawing statistical inferences on them, such as testing hypotheses and setting confidence intervals. The literature on long memory is now quite mature, and so we have chosen to focus on some basic literature for scalar time series. Space does not permit a full treatment of developments even in this scalar case, let alone in other settings where long memory can arise and has been studied, which are beyond the scope here. These include models for multivariate data, including regression and cointegration models, spatial models, panel data models, and functional time series models.

Acknowledgments

I am grateful for the comments of the two reviewers.

## References

Adelman, I. (1965). Long cycles: Fact or Artefact? *American Economic Review*, *55*, 444–463.Find this resource:

Adenstedt, R. K. (1974). On large-sample estimation of the mean of a stationary random sequence. *Annals of Statistics*, *2*, 1095–1107.Find this resource:

Andrews, D. W. K., & Guggenberger, K. (2003). A bias-reduced log-periodogram estimator for the long memory parameter. *Econometrica*, *71*, 675–712.Find this resource:

Andrews, D. W. K., & Sun, Y. (2004). Adaptive local polynomial Whittle estimation of long-range dependence. *Econometrica*, *72*, 569–614.Find this resource:

Baillie, R. T., Bollerslev, T., & Mikkelsen, H. O. (1996). Fractionally integrated generalized autoregressive conditional heteroscedasticity. *Journal of Econometrics*, *74*, 3–30.Find this resource:

Beran, J. (1995). Maximum likelihood estimation of the differencing parameter for invertible short- and long-memory ARIMA models. *Journal of the Royal Statistical Society, Series B*, *57*, 659–672.Find this resource:

Bloomfield, P. (1972). An exponential model for the spectrum of a scalar time series. *Biometrika*, *60*, 217–226.Find this resource:

Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity. *Journal of Econometrics*, *31*, 307–327.Find this resource:

Box, G. E. P., & Jenkins, G. M. (1971). *Time series analysis, forecasting and control*. San Francisco: Holden-Day.Find this resource:

Breidt, F. J., Crato, N., & de Lima, P. (1998). The detection and estimation of long memory in stochastic volatility. *Journal of Econometrics*, *83*, 325–334.Find this resource:

Dahlhaus, R. (1989). Efficient parameter estimation for self-similar processes. *Annals of Statistics*, *17*, 1749–1766.Find this resource:

Delgado, M. J., & Robinson, P. M. (1996). Optimal spectral bandwidth for long memory. *Statistica Sinica*, *6*, 97–112.Find this resource:

Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. *Journal of the American Statistical Association*, *74*, 427–431.Find this resource:

Diebold, F. X., & Rudebusch, G. D. (1991). On the power of Dickey-Fuller tests against fractional alternatives. *Economic Letters*, *35*, 155–160.Find this resource:

Ding, Z., & Granger, C. W. J. (1996). Modelling volatility persistence of speculative returns: A new approach. *Journal of Econometrics*, *73*, 185–215.Find this resource:

Ding, Z., Granger, C. W. J., & Engle R. F. (1993). A long memory property of stock market returns and a new model. *Journal of Empirical Finance*, *1*, 83–106.Find this resource:

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. *Econometrica*, *50*, 987–1007.Find this resource:

Fairfield Smith, H. (1938). An empirical law describing heterogeneity in the yields of agricultural crops. *Journal of Agricultural Science*, *28*, 1–23.Find this resource:

Fox, R., & Taqqu, M. S. (1986). Large sample properties of parameter estimates for strongly dependent stationary Gaussian time series. *Annals of Statistics*, *14*, 517–532.Find this resource:

Fox, R., & Taqqu, M. S. (1987). Central limit theorems for quadratic forms in random variables having long-range dependence. *Probability Theory and Related Fields*, *74*, 213–440.Find this resource:

Geweke, J., & Porter-Hudak, S. (1983). The estimation and application of long-memory time series models. *Journal of Time Series Analysis*, *4*, 221–238.Find this resource:

Giraitis, L., Hidalgo, F. J., & Robinson, P. M. (2001). Gaussian estimation of parametric spectral density with unknown pole. *Annals of Statistics*, *29*, 987–1023.Find this resource:

Giraitis, L., Kokoszka, P., & Leipus, R. (2000). Stationary ARCH models: Dependence structure and central limit theorem. *Econometric Theory*, *16*, 3–22.Find this resource:

Giraitis, L., Koul, H., & Surgailis, D. (2012). *Large sample inference for long memory processes*. London: Imperial College.Find this resource:

Giraitis, L., & Robinson, P. M. (2001). Whittle estimation of ARCH models. *Econometric Theory*, *17*, 608–631.Find this resource:

Giraitis, L., & Robinson, P. M. (2003). Edgeworth expansions for semiparametric Whittle estimation of long memory. *Annals of Statistics*, *31*, 987–1023.Find this resource:

Giraitis, L., Robinson, P. M., & Surgailis, D. (2000). A model for long memory conditional heteroscedasticity. *Annals of Applied Probability*, *10*, 1002–1024.Find this resource:

Giraitis, L., & Surgailis, D. (1990). A central limit theorem for quadratic forms in strongly dependent random variables and its application to asymptotical normality of Whittle’s estimate. *Probability Theory and Related Fields*, *86*, 87–104.Find this resource:

Giraitis, L., & Taqqu, M. S. (1999). Whittle estimator for finite-variance non-Gaussian time series with long memory. *Annals of Statistics*, *27*, 178–203.Find this resource:

Granger, C. W. J. (1966). The typical spectral shape of an economic variable. *Econometrica*, *34*, 150–161.Find this resource:

Granger, C. W. J. (1980). Long memory relationships and the aggregation of dynamic models. *Journal of Econometrics*, *14*, 227–238.Find this resource:

Granger, C. W. J., & Joyeux, R. (1980). An introduction to long-memory time series and fractional differencing. *Journal of Time Series Analysis*, *1*, 15–29.Find this resource:

Gray, H. L., Zhang, N. I., & Woodward, W. A. (1989). On generalized fractional processes. *Journal of Time Series Analysis*, *10*, 233–257.Find this resource:

Hannan, E. J. (1973). The asymptotic theory of linear time-series models. *Journal of Applied Probability*, *10*, 130–145.Find this resource:

Hannan, E. J. (1976). The asymptotic distribution of serial covariances. *Annals of Statistics*, *4*, 396–399.Find this resource:

Henry, M., & Robinson, P. M. (1996). Bandwidth choice in Gaussian semiparametric estimation of long-range dependence. In P. M. Robinson & M. Rosenblatt (Eds.), *Athens conference on applied probability and time series analysis, Vol. II: Time series analysis. In memory of E. J. Hannan* (pp. 220–232). New York: Springer-Verlag.Find this resource:

Hosking, J. R. M. (1981). Fractional differencing. *Biometrika*, *68*, 165–176.Find this resource:

Hualde, J., & Robinson, P. M. (2011). Gaussian pseudo-maximum likelihood estimation of fractional time series models. *Annals of Statistics*, *39*, 3152–3181.Find this resource:

Hurst, H. (1951). Long term storage capacity of reservoirs. *Transactions of the American Society of Civil Engineers*, *116*, 770–799.Find this resource:

Hurvich, C. M., & Beltrao, K. I. (1993). Asymptotics for the low-frequency estimates of the periodogram of a long memory time series. *Journal of Time Series Analysis*, *14*, 455–472.Find this resource:

Hurvich, C. M., & Beltrao, K. I. (1994). Automatic semiparametric estimation of the memory parameter of a long-memory time series. *Journal of Time Series Analysis*, *15*, 285–302.Find this resource:

Hurvich, C. M., & Brodsky, J. (2001). Broadband semiparametric estimation of the memory parameter of a long-memory time series using fractional exponential model. *Journal of Time Series Analysis*, *22*, 221–249.Find this resource:

Hurvich, C. M., & Chen, W. W. (2000). An efficient taper for potentially overdifferenced long-memory time series. *Journal of Time Series Analysis*, *21*, 155–180.Find this resource:

Hurvich, C. M., & Deo, R. S. (1999). Plug-in selection of the number of frequencies in regression estimates of the memory parameter of a long-memory time series. *Journal of Time Series Analysis*, *20*, 331–341.Find this resource:

Hurvich, C. M., Deo, R. S., & Brodsky, J. (1998). The mean squared error of Geweke and Porter-Hudak’s estimates of the memory parameter of a long memory time series. *Journal of Time Series Analysis*, *19*, 19–46.Find this resource:

Hurvich, C. M., Moulines, E., & Soulier, P. (2005). Estimating long memory in volatility. *Econometrica*, *73*, 1283–1328.Find this resource:

Hurvich, C. M., & Ray, B. K. (1995). Estimation of the memory parameter for nonstationary or noninvertible fractionally integrated processes. *Journal of Time Series Analysis*, *16*, 17–41.Find this resource:

Janacek, G. J. (1982). Determining the degree of differencing for time series via log spectrum. *Journal of Time Series Analysis*, *3*, 177–183.Find this resource:

Kashyap, R., & Eom, K. (1988). Estimation in long-memory time series model. *Journal of Time Series Analysis*, *9*, 35–41.Find this resource:

Künsch, H. R. (1986). Discrimination between monotonic trends and long-range dependence. *Journal of Applied Probability*, *23*, 1025–1030.Find this resource:

Künsch, H. R. (1987). Statistical aspects of self-similar processes. In Y. A. Prohorov (Ed.), *Proceedings of the first world congress of the Bernoulli society, Tashkent, USSR, 1986* (pp. 67–74). Utrecht: VNU Science.Find this resource:

Lippi, M., & Zaffaroni, P. (1998). *Aggregation and simple dynamics: Exact asymptotic results*. Preprint.Find this resource:

Lo, A. W. (1991). Long-term memory in stock market prices. *Econometrica*, *59*, 1279–1313.Find this resource:

Lobato, I. G., & Robinson, P. M. (1996). Averaged periodogram estimation of long memory. *Journal of Econometrics*, *73*, 303–324.Find this resource:

Lobato, I. G., & Robinson, P. M. (1998). A nonparametric test for $I(0)$. *Review of Economic Studies*, *65*, 475–495.Find this resource:

Mandelbrot, B. B. (1972). Statistical methodology for non-periodic cycles: From the covariance to R/S analysis. *Annals of Economic and Social Measurement*, *1*, 259–290.Find this resource:

Mandelbrot, B. B., & Taqqu, M. S. (1979). Robust R/S analysis of long-run serial correlations. *Bulletin of the International Statistical Institute*, *48*(2), 69–104.Find this resource:

Mandelbrot, B. B., & Van Ness, J. W. (1968). Fractional Brownian motions, fractional noises and applications. *SIAM Review*, *10*, 422–437.Find this resource:

Mandelbrot, B. B., & Wallis, T. R. (1969). Robustness of the rescaled range R/S in the measurement of noncyclic long run statistical dependence. *Water Resources Research*, *5*, 967–988.Find this resource:

Marinucci, D., & Robinson, P. M. (1999). Alternative forms of fractional Brownian motion. *Journal of Statistical Planning and Inference*, *80*, 111–122.Find this resource:

Moulines, E., & Soulier, P. (1999). Broadband log periodogram regression of time series with long range dependence. *Annals of Statistics*, *27*, 1415–1439.Find this resource:

Moulines, E., & Soulier, P. (2000). Data driven order selection for projection estimates of the spectral density of time series with long range dependence. *Journal of Time Series Analysis*, *21*, 193–218.Find this resource:

Nelson, D. (1991). Conditional heteroskedasticity in asset returns: A new approach. *Econometrica*, *59*, 347–370.Find this resource:

Robinson, P. M. (1978a). Statistical inference for a random coefficient autoregressive model. *Scandinavian Journal of Statistics*, *5*, 163–168.Find this resource:

Robinson, P. M. (1978b). Alternative models for stationary stochastic processes. *Stochastic Processes and Their Applications*, *8*, 151–152.Find this resource:

Robinson, P. M. (1991). Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. *Journal of Econometrics*, *47*, 67–84.Find this resource:

Robinson, P. M. (1994a). Time series with strong dependence. In C. A. Sims (Ed.), *Advances in econometrics* (Vol. 1, pp. 47–95). Cambridge, UK: Cambridge University Press.Find this resource:

Robinson, P. M. (1994b). Semiparametric analysis of long-memory time series. *Annals of Statistics*, *22*, 515–539.Find this resource:

Robinson, P. M. (1994c). Efficient tests of nonstationary hypotheses. *Journal of the American Statistical Association*, *89*, 1420–1437.Find this resource:

Robinson, P. M. (1995a). Log-periodogram regression of time series with long range dependence. *Annals of Statistics*, *23*, 1048–1072.Find this resource:

Robinson, P. M. (1995b). Gaussian semiparametric estimation of long-range dependence. *Annals of Statistics*, *23*, 1630–1661.Find this resource:

Robinson, P. M. (2001). The memory of stochastic volatility models. *Journal of Econometrics*, *101*, 195–218.Find this resource:

Robinson, P. M., & Henry, M. (2003). Higher-order kernel semiparametric M-estimation of long memory. *Journal of Econometrics*, *114*(1), 1–27.Find this resource:

Robinson, P. M., & Zaffaroni, P. (1997). Modelling nonlinearity and long memory in time series. *Fields Institute Communications*, *11*, 161–170.Find this resource:

Robinson, P. M., & Zaffaroni, P. (1998). Nonlinear time series with long memory: A model for stochastic volatility. *Journal of Statistical Planning and Inference*, *68*, 359–371.Find this resource:

Rosenblatt, M. (1961). Independence and dependence. In *Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability* (pp. 411–443). Berkeley: University of California Press.Find this resource:

Samarov, A., & Taqqu, M. S. (1988). On the efficiency of the sample mean in long memory noise. *Journal of Time Series Analysis*, *9*, 191–200.Find this resource:

Shimotsu, K., & Phillips, P. (2005). Exact local Whittle estimation of fractional integration. *Annals of Statistics*, *33*, 1890–1933.Find this resource:

Taqqu, M. S. (1975). Weak convergence to fractional Brownian motion and to the Rosenblatt process. *Zeitschrift für Wahrscheinlichkeitstheorie*, *31*, 287–302.Find this resource:

Taylor, S. J. (1986). *Modelling financial time series*. Chichester, UK: Wiley.Find this resource:

Velasco, C. (1999a). Non-stationary log-periodogram regression. *Journal of Econometrics*, *91*, 325–371.Find this resource:

Velasco, C. (1999b). Gaussian semiparametric estimation of nonstationary time series. *Journal of Time Series Analysis*, *20*, 87–127.Find this resource:

Velasco, C. (2000). Non-Gaussian log-periodogram regression. *Econometric Theory*, *16*, 44–79.Find this resource:

Velasco, C., & Robinson, P. M. (2000). Whittle pseudo-maximum likelihood estimation for nonstationary time series. *Journal of the American Statistical Association*, *95*, 1229–1243.Find this resource:

Whistler, D. (1990). *Semiparametric models of daily and intra-daily exchange rate volatility* (Doctoral Dissertation). University of London.Find this resource:

Whittle, R. (1951). *Hypothesis testing in time series analysis*. Uppsala: Almqvist.Find this resource:

Yajima, Y. (1985). On estimation of long-memory time series models. *Australian Journal of Statistics*, *27*, 303–320.Find this resource:

Yong, C. H. (1974). *Asymptotic behaviour of trigonometric series*. Hong Kong: Chinese University.Find this resource: