# The Covariance Matrix and White Noise

The two fundamental ingredients of Markowitz (1952) mean-variance optimization are the expected (excess) return for each asset (the portfolio manager’s ability to forecast), and the covariance matrix of asset returns (the risk control). Then, additional constraints on the problem (no short sales, etc.) are added. Usually, performances are measured against the benchmark of an equity market index with fixed weights. The sample covariance matrix has the advantages of easy computation and of being unbiased (i.e., its expected value is equal to the true covariance matrix). However, it contains estimation errors that can severely perturb a mean-variance optimizer; specifically, increasing estimation error build-up when the number of data points is of *comparable or smaller* order than the number of individual assets. *Specifically, the following relates to the structure of risk in the stock market, not with the structure of expected returns.*

The solution is to add some problem-specific structure on the estimator; for instance, in the case of assets returns, a low-dimensional structure like a K-factor model with uncorrelated residuals. Indeed, K controls how much structure we impose: the fewer the factors, the stronger the structure. However, when selecting portfolios with low out-of-sample variance, in any given data set there may exist a factor model that performs well, but it may be a different one for every data set. One approach is to use a combination of industry factors and risk indices, with the total number of factors being on the order of 50. An example is BARRA’s U.S. Equity model. Another approach is to use statistical factors, such as principal components, with the total number of factors being on the order of 5. A commercial vendor offering risk models based on statistical factors is APT. In practice, selecting the right set of factors is much more an art rather than a quantitative exercise.

However, by imposing some ad-hoc structure on the covariance matrix, such as diagonality or a factor model, without having prior information about the true structure of the matrix will in general result in misspecification. In other terms, the resulting estimator can be so biased that it may bear little resemblance to the true covariance matrix.

#### Shrinkage of the Covariance matrix

One way to get a well-conditioned structured estimator is to impose that all variances are the same and all covariances are zero. The estimator is a weighted average of this structured estimator with the sample covariance matrix. The average inherits the good conditioning of the structured estimator. By choosing the weight optimally according to a quadratic loss function, the weighted average of the sample covariance matrix and the structured estimator is more accurate than either of them. However, the true optimal weight depends on the true covariance matrix, which is unobservable.

An alternative way to impose a factor structure is to take a weighted average of the sample covariance matrix with Sharpe’s (1963) single-index model estimator. The weight α (between zero and one) assigned to the single-index model controls how much structure we impose. Note that the optimal shrinkage intensity depends on the correlation between estimation error on the sample covariance matrix and on the shrinkage target. When the two are positively correlated the resulting additional information is smaller; specifically, this correlation term resolves a conceptual inconsistency deriving from the fact that the prior is estimated from sample data, yet the prior is assumed to be independent of sample data. See the Python code below: