Factors are Not Commodities
(Part 2 of 2)

Categories Author: Chris Meredith, Investing

(See Part 1 of this article for background info on the importance of accurately constructing factor signals and how they can impact which stocks get selected into the portfolios.)

Alpha Signals

Quantitative managers tend to combine individual factors together into themes like Value, Momentum, and Quality. But there are several ways that managers can combine factors into models for stock selection. And models can get very complicated. In the process of manager selection, allocators have the difficult task of gauging the effectiveness of these models. The common mistake is assuming complexity equals effectiveness.

To demonstrate how complexity can degrade performance, let’s take five factors in the Large Stocks space and aggregate them into a Value theme: price/sales, price/earnings, EBITDA/enterprise value, free cash flow/enterprise value, and shareholder yield (a combination of dividend and buyback yield).

The most straightforward is an equally-weighted model: give every factor the same weight. This combination of the five factors generates an annual excess return of 4.06% in the top decile. An ordinary linear regression increases the weighting of free cash flow-to-enterprise value, and lowers the weighting on price-to-earnings because it was less effective over that time frame. This increases the apparent effectiveness by +15bps (annualized) — not a lot, but remember this is Large Cap where edge is harder to generate. Other linear regressions, like ridge or lasso, might be used for parameter shrinkage or variable selection and try to enhance these results.

Moving up the complexity scale, non-linear or machine learning models like Neural Networks, Support Vector Machines, or Decision Trees can be used to build the investment signal. There has been a lot of news around Big Data and the increased usage of machine learning algorithms to help predict outcomes. For this example, we’ve built an approach using a Support Vector Regression, a common non-linear machine-learning technique. At first look, the Support Vector Regression looks very effective, increasing the outperformance of selecting stocks on Value to 4.55%, almost a half of a percent annualized return over the equally-weighted model.

Cheapest Deciles1
Excess Return vs. U.S. Large Stocks (1964–2016)
chart

The appeal of a machine-learning approach is strong. Intuitively, the complex process should do better than the simple, and the first pass results look promising. But this apparent edge does not hold up on examination.

This apparent edge is from overfitting a model. Quantitative managers might have different ways of constructing factors, but we are all working with data that does not change as we research ideas: quarterly financial and pricing data back to 1963. As we build models, we can torture that data to create the illusion of increased effectiveness. The linear regression and support vector machines are creating weightings out of the same data used to generate the results, which will always look better.

The statistical method to help guard against overfitting is called bootstrapping. The process creates in-sample and out-of-sample tests by taking random subsamples of the dates, as well as subsets of the companies included in the analysis. Regression weightings are generated on an in-sample dataset and tested on an out-of-sample dataset. The process is repeated a hundred times to see how well the weighting process holds up.

In the bootstrapped results, you can see how the unfitted equally-weighted model maintains its effectiveness at about the same level. The in-sample data looks just like the first analysis: the linear regression does slightly better and the Support Vector Regression (SVR) does significantly better. When applying the highly-fitted SVR to the out-of-sample data, the effectiveness inverts. Performance degrades at a statistically significant level once you implement on investments that weren’t part of your training data.

Cheapest Deciles by Value
(Excess Return vs. U.S. Large Stocks, 1964–2016)
chart

This doesn’t mean that all weighted or machine learning models are broken, rather that complex model construction comes with the risk of overfitting to the data and can dilute the edge of factors. Overfitting is not intentional, but a by-product of having dedicated research resources that are constantly looking for ways to improve upon their process. When evaluating the factor landscape, understand the model used to construct the seemingly similar themes of Value, Momentum or Quality. Complexity in itself is not an edge for performance, and makes the process less transparent to investors creating a “black box” from the density of mathematics. Simple models are more intuitive and likely to hold up in the true out-of-sample dataset, the future.

Multi-Factor Signals

Multi-factor ETFs have a lot of moving parts: the definition of factors, the construction process of building investment themes, as well as the portfolio construction techniques. Market-capitalization ETFs are very straightforward in comparison. Different products use broad, similar universes and weight on a single factor. And market capitalization has one of the most common definitions used for investing: shares outstanding multiplied by the price per share. The result is that different products by different managers have extremely similar results, and these products can be substitutes for one another.

The table here shows the 2016 returns for three of the most popular market cap ETFs: the SPDR® S&P 500 ETF (SPY), the iShares Russell 1000 ETF (IWB), and the Vanguard S&P 500 ETF (VOO). These are widely held and have almost $300 billion in combined assets as of December 30, 2016. For 2016, the returns of these three ETFs are within 17bps of each other. When looking at the annualized daily tracking error for the year, we can see that they track one another very closely. Looking at these returns, it makes sense that the key selection criteria between the funds would be based on the lowest fee.

2016: Market Cap ETFs2
table

For a comparison, let’s examine four multi-factor ETFs that were launched in 2015: iShares Edge MSCI Multifactor USA ETF (LRGF), the SPDR® MSCI USA StrategicFactorsTM ETF (QUS), the Goldman Sachs ActiveBeta U.S. Large Cap Equity ETF (GSLC), and the JPMorgan Diversified Return U.S. Equity ETF (JPUS). Each fund uses a broad large cap universe and then selects or weights stocks based on a combination of factor themes: Value, Momentum, and Quality metrics. At first glance, it looks like these should be very similar to one another.

Each fund is based on an index, which consists of a publicly stated methodology for how the indexes are constructed. When digging through the construction methodologies, you start seeing that different factors are used in building these themes. The only common Value metric used across all four is price-to-book. Two funds do use price-to-sales, but otherwise each fund is using one or two metrics unique to their competitors. QUS does not include momentum, but the other three funds use different expressions of momentum, with two conditionalizing on volatility. The most common Quality metric is return on equity, used in three funds, followed by debt-to-equity (used in two). Even though most of these funds use the equally-weighted approach in building their investment themes of Value, Momentum, and Quality, because of the different inputs, the stock selection will be very different.

Multi-Factor ETFs
table

These different rankings are then utilized for stock selection and weighting in different portfolio construction techniques. When comparing holdings as of December 30, 2016, the breadth of securities held in the fund ranges anywhere from 139 to 614 stocks. Maximum weights range from 3.3% to 0.6%, with the top 25 securities accounting from 43% to 14% of the total assets. They each use different techniques and risk models with unique constraints to shape weightings, leading to widely different portfolios. Looking at these four funds, as well as the SPY S&P 500 fund, they can have higher active share with each other than they do with the overall market.

Active Shares (As of 12/31/16)
table

These differences in signal, construction, and holdings lead to very different investment results. When comparing the results for 2016, the highest return out of all the funds was 12.96% while the lowest returned 8.73% — a return gap of 423bps for the year. Also, when looking at the daily tracking error between the products, they generate a wider difference of returns with each other than they do with the market.

2016: Multi-Factor ETFs3
table

Keep in perspective that this is a single year. This is not an indictment of GSLC; it’s most likely that GSLC was caught in the underperformance of volatility given that it focuses on low volatility names in both its Volatility and Momentum ActiveBeta® indexes. To confirm that, run the holdings through a factor attribution framework.

The central point is that, even though these four funds look very similar, they generate very different results. Factor products that generate several hundreds of basis points of difference in a single year are not commoditized and should not be chosen for investment in because of a few basis points in fees. Cost leadership is the key feature for generic market-capitalization weighted schemes, but product differentiation and focus in the context of fees should be the reasons for investing in multi-factor products.

SUMMARY

There is significant edge in how factor signals are constructed. The difficulty is creating transparency around this edge for investors. Complexity of stock selection and construction methodology decrease transparency, almost as much as active quantitative managers that create a “black box” around their stock ranking methodologies. This leaves investors at a disadvantage on trying to differentiate between quantitative products. This inability to differentiate is why price wars are starting between products that have strong differences in features and results.

Investors need education on this differentiation so they’re not selecting only on the lowest fees. Sophisticated manager selection groups of allocators focus on people, philosophy and process as well as performance. These things will still matter in understanding a factor portfolio, but now they need to add expertise on understanding factors and portfolio construction. Large institutional and investment consultant manager selection groups will have the difficulty of adding top-tier quantitative investment staff to help with this differentiation. Smaller groups and individual investors will have to advance their own understanding of how quantitative products are constructed. For all of these factor investors, it will help to build trusted partnerships with quantitative asset managers willing to share insights on the factor investing landscape.


Subscribe to our Blog and get email alerts every time new articles are posted …


  1. For example, the cheapest 10% of stocks by P/E.
  2. SPDR® S&P 500 ETF (SPY), the iShares Russell 1000 ETF (IWB), and the Vanguard S&P 500 ETF (VOO)
  3. iShares Edge MSCI Multifactor USA ETF (LRGF), SPDR MSCI USA StrategicFactorsTM ETF (QUS), Goldman Sachs ActiveBeta U.S. Large Cap Equity ETF (GSLC), JPMorgan Diversified Return U.S. Equity ETF (JPUS), SPDR® S&P 500 ETF (SPY)