# Factor models based on linear regression

Factor models are applied by portfolio managers to analyze the potential returns on a portfolio of risky assets, to choose the optimal allocation of their funds to different assets and to measure portfolio risk. The theory of linear regression-based factor models applies to most portfolios of risky assets, excluding options portfolios but including alternative investments such as real estate, hedge funds, and volatility, as well as traditional assets such as commodities, stocks, and bonds.

#### Understanding returns

The expected return on each asset in the portfolio is approximated as a weighted sum of the expected returns to several market risk factors. The weights are called *factor sensitivities or factor betas* and are estimated by regression. Market risk factors could include broad market indices, industry factors, style factors (e.g. value, growth, momentum, size), economic factors (e.g. interest rates, inflation) or statistical factors (e.g. principal components). By inputting scenarios and stress tests on the expected returns and the volatilities and correlations of these risk factors, the factor model representation allows the portfolio manager to examine expected returns under different market scenarios.

#### Understanding risk

The market risk management of portfolios has traditionally focused only on the **undiversifiable risk of a portfolio**.

Undiversifiable risk is the risk that cannot be reduced to zero by holding a large and diversified portfolio. In the context of a factor model, which aims to relate the distribution of a portfolio’s return to the distributions of its risk factor returns, we also call the undiversifiable risk the systematic risk.

The **specific risk** (or idiosyncratic and residual risk) is the risk that is not associated with the risk factor returns. In a linear regression model of the asset return on risk factor returns, it is the risk arising from the variance of the residuals.

#### How to use linear regression-based factor models

When asset managers employ a factor model like the single index (one asset and one factor) they employ long histories of asset prices and benchmark values, measuring returns at a weekly or monthly frequency, and assuming that the true parameters are constant. In this case, the regression technique is appropriate and more data result in smaller sampling error. Three to five years of monthly or weekly data is typical. Note that the single index is **an empirical description of stock returns that can represent a linear relationship with any economic variable relevant to the security**; **however, it assumes only one factor as the cause of the systematic risk affecting returns**. The regression outputs estimates of alphas, betas, etc.

When risk managers employ a multi-factor model for a portfolio of assets they employ shorter histories of portfolio and benchmark values, measuring returns at daily frequency, and not assuming that the true values of the parameters are constant. In this case, they employ time-varying estimation techniques (exponentially weighted moving averages or generalized autoregressive conditional heteroscedasticity). The portfolio can be studied in two ways:

- Define a hypothetical portfolio with weights specified, use estimates of the alpha, beta, and residuals for each asset to infer the characteristics of such a portfolio. By defining multiple hypothetical portfolios, an asset manager can compare many different portfolios for recommendation to his investors.
- Consider a real portfolio owned by an investor and construct a constant weighted artificial returns history for the portfolio in order to assess the relative performance, the systematic risk, and the specific risk of an existing portfolio. However, such reconstructed ‘constant weight’ series for the portfolio returns will not be the same as the actual historical returns series for the portfolio, unless the portfolio was rebalanced continually so as to maintain the weights constant. By using current weights, a risk manager can forecast its risk over a future risk horizon of a few days, weeks or months.

#### Application: Portofolio analysis and comparison

A monthly dataset of three assets (stocks, bond, and cash) is obtained by a return generation process spanning about four years.

Multiperiod returns are returns to an investment which is made with a horizon larger than one. Let us consider the case of the returns to an investment made in time t until time t+n (units are months). The simple multi-period returns will be (1+ the product of the n *annual* returns) -1; instead, the log multi-period returns will be the sum of the n returns.

When units are not months or when an investor has a cumulative return for a given period, even if it is a specific number of days, an annualized performance figure can be calculated as

Note that 365 may be adjusted to 252 trading days.

In the case of a portfolio, the one-period simple returns are a linear function in the weights of the portfolio, whereas the one-period log-returns are a non-linear (exponential) function of the weights of the portfolio. For such a reason, simple returns are said to be distributed across assets, whereas log-returns are distributed across time. In practice, the portfolio simple multi-period returns will be the product of the n returns minus 1, whereas the portfolio log multi-period returns will be the sum.

The linear definition works very well for portfolios over single periods, in the sense that expected values and variances of portfolios can be derived by expected values variances and covariances of the components, as the portfolio linear return over a time period is a linear combination of the returns of the portfolio components. For analogous reasons, the log definition works very well for single securities over time.

When considering the data at hand, continuous exponential compounding cannot be observed. As a result, the returns process can be very well approximated by geometric compounding. Therefore, the decision to use simple or log returns depends on the data. In practice, the data should be studied to understand whether returns are normally or log-normally distributed. However, when employing a returns-generating process, the choice can be defined a priori.