The narrative of Smart Beta products is that factors are becoming an investment commodity. Factors are not commodities, but unique expressions of investment themes. One Value strategy can be very different from another, and can lead to very different results. There are many places that factor portfolios can differ. The difficulty for asset allocators is in identifying how factor strategies differ from one another, when they all purport to use the same themes: Value, Momentum and Quality.
Over the last couple of years, several Multi-factor funds that combine Value, Momentum and Quality were launched. As these products compete to garner assets, price competition has started amongst rivals. In December, Blackrock cut fees to smart beta ETFs in competition with Goldman Sachs which has staked out a cost leadership position in the market space. Michael Porter, the expert in competitive strategy, wrote in 1980 that there are three generic strategies that can be applied to any business for identifying a competitive advantage: cost leadership, differentiation or focus. Cost leadership can be an effective strategy, but the key to any price war is the products need to be near-perfect substitutes for one another, such as commodities. This paper focuses on how quantitative asset managers can have large differences in factor definitions, differences in combining factors into themes, and differences in portfolio construction techniques leading to a wide range of investment experiences in multi-factor investment products.
Value investing through ratios seems to be very straightforward. Price/Earnings ratios are quoted widely as a common metric to gauge the relative cheapness of one stock to another. “Information Content of Equity Analyst Reports” by Asquith, Mikhail and Au found that 99% of equity analyst reports use earnings multiples in analyzing a company. The P/E ratio is used widely because it is straightforward and makes intuitive sense: as an equity owner you are entitled to the residual earnings of the company after expenses, interest and taxes. A ratio of price to earnings tells you how much you’re paying for every dollar of earnings.
Getting a P/E ratio is as simple a exercise as opening up a web browser and typing in a search. But if you’ve ever compared P/E ratios from multiple sources, you can get very different numbers for the same company. Take Allergan (NYSE: AGN) as an example. As of January 12th, 2017, Yahoo! Finance had AGN with a P/E of 6.06. But Google Finance had 15.84. If you have access to a Bloomberg terminal, Bloomberg had it as a P/E of 734. Factset has no P/E ratio. You can feel like you’re stuck in Arthur Block’s Segal’s Law: “a man with a watch knows what time it is. A man with two watches is never sure.”
These discrepancies happen because there are a lot of different ways to put together a P/E ratio. One could use Earnings per Share divided by the price of the stock. If so, should you use basic or diluted EPS? There’s a difference if you switch to the LTM Net Income dividend by the total Market Cap of the company, as shares can change over a given quarter. But the reason for Allergan’s different ratios is that some financial information providers use bottom-line earnings while others take Income before Extraordinaries and Discontinued Operations. On August 2nd, Teva (NYSE: TEVA) acquired Allergan’s generics business “Actavis Generics” for $33.4 billion in cash and 100 million shares of Teva, generating $16bn in earnings from Discontinued Operations. After unwinding this, the company actually lost $1.7bn in the third quarter. Hence no P/E ratio. Depending on whether an adjustment is made on this, Allergan will either appear as a top percentile cheap stock on Earnings Yield (inverse of the P/E ratio) or in the 94th percentile.
Accounting adjustments for Extraordinaries and Discontinued Operations aren’t the only item affecting an earnings ratios. When considering earnings, you want to measure the available economic surplus that flows to the holder of the common equity. If preferred stock exists, it supercedes the claims of common shareholders. Fannie Mae (OTC: FNMA) is a great example of how preferred dividends can absorb earnings from common shareholders. During the 2008 crisis, Fannie Mae issued a senior tranche of preferred stock that is owned by the U.S. Treasury, and paying a $9.7bn dividend of the $9.8bn in earnings the company generates. There is a junior preferred tranche held by investors like Pershing Square and the Fairholme funds which is currently not receiving dividends and are submitting legal challenges to receive some portion of the earnings. This leaves Common shareholders behind a long line of investors with a prioritized claim on earnings. But some methodologies adjust earnings take preferred dividends after earnings, while others do not, creating a difference in having a P/E of 2.3 (an Earnings Yield of 43%) or a P/E of 185 (Earnings Yield of 0.5%).
These comments are not about the cheapness of Allergan or Fannie Mae, rather the importance of your definition of “earnings” and the adjustments you apply. If these considerations sound like fundamental investing, it’s because they are. Fundamental analysts consider these adjustments in the analysis of a company. Factor investors work through the same accounting issues as fundamental investors, with the additional burden of trying to systematically adjust to create the best metric that accounts for the accounting differences across thousands of companies. Investing results can be very different based on these adjustments. In the U.S. Large Stocks Universe, there is a +38bps improvement on the best decile of Earnings Yield if you adjust for Discontinued Items, Extraordinaries and Preferred Dividends. To set some scale, in the eVestment Universe the difference between a median and a top quartile manager just +60bps a year.
Compustat Large Stocks Universe, 1963-2016
Adjustments to Value signals are not limited to Price-to-Earnings. Book Value can be adjusted for the accounting of Goodwill and Intangibles. Dividend Yield can be calculated using the dividends paid over the trailing 12-months, or annualizing the most recent payment. In 2004, Microsoft paid out $32 billion of its $50 billion in cash in a one-time $3 per share dividend when the stock was trading at around $29. Should you include that dividend in calculating yield, knowing that future investors won’t receive similar dividend streams?
Differences in signal construction are not limited to Value factors. Momentum investors know that there are actually three phenomena observed in past price movement: short-term reversals in the first month, medium-term momentum over the next 12 months and long-term reversals over a 3 to 5-year period. Get two momentum investors into a room, and they will disagree over whether to delay the momentum signal one month to avoid reversals, the 12-months minus 1-month. Quality investors argue the usage of non-current balance sheet items, or the loss of effectiveness in investing on changes in analyst estimates. Volatility can be measured using raw volatility, beta, or idiosyncratic vol, to name just a few methods.
Factors are constructed as unique expressions of an investment idea and are not the same for everyone. Small differences can have large impact on which stocks get into the portfolios. These effects are more significant using an optimizer which can maximize errors, or concentrating portfolios giving more weight on individual names. This is far from simply grabbing a P/E ratio from a Bloomberg data feed. There is skill in constructing factors.
Quantitative managers tend to combine individual factors together into themes like Value, Momentum and Quality. But there are several ways that managers can combine factors into models for stock selection. And models can get very complicated. In the process of manager selection, allocators have the difficult task of gauging the effectiveness of these models. The common mistake is assuming complexity equals effectiveness.
To demonstrate how complexity can degrade performance, we can take five factors in the Large Stocks space and aggregate them into a Value theme: Sales/Price, Earnings/Price, EBITDA/EV, Free Cash Flow/Enterprise Value and Shareholder Yield (a combination of dividend and buyback yield).
The most straightforward is an equally-weighted model: give every factor the same weight. This combination of the five factors generates an annual excess return of 4.06% in the top decile. An ordinary linear regression increases the weighting of Free Cash Flow to Enterprise Value and lowers the weighting on Earnings/Price, because it was less effective over that time frame. This increases the apparent effectiveness by +15bps annualized, not a lot, but remember this is Large Cap where edge is harder to generate. Other linear regressions, like ridge or lasso, might be used for parameter shrinkage or variable selection and try to enhance these results.
Moving up the complexity scale, non-linear or machine learning models like Neural Networks, Support Vector Machines or Decision Trees can be used to build the investment signal. There has been a lot of news around Big Data and the increased usage of machine learning algorithms to help predict outcomes. For this example, we’ve built an approach using a Support Vector Regression, a common non-linear machine-learning technique. At first look, the Support Vector Regression looks very effective, increasing the outperformance of selecting stocks on Value to 4.55%, almost a half of a percent annualized return over the equally weighted model.
Compustat Large Stocks Universe, 1963-2016
The appeal of the machine-learning approach is strong. Intuitively, the complex process should do better than the simple, and the first pass results look promising. But the excess returns do not hold up on examination. This apparent edge is from overfitting a model. Quantitative managers might have different ways of constructing factors, but we are all working with data that does not change as we research ideas: quarterly financial and pricing data back to 1963. As we build models, we can torture that data to create the illusion of increased effectiveness. The linear regression and support vector machines are creating weightings out of the same data used to generate the results, which will always look better.
The statistical method to help guard against overfitting is bootstrapping. The process creates in-sample and out-of-sample tests by taking random subsamples of the dates, as well as subsets of the companies included in the analysis. Regression weightings are generated on an in-sample dataset and tested on an out-of-sample dataset. The process is repeated a hundred times to see how well the weighting process holds up.
In the bootstrapped results, you can see how the unfitted equally weighted model maintains its effectiveness at about the same level. The in-sample data looks just like the first analysis: the linear regression does slightly better and the SVR does significantly better. When applying the highly-fitted Support Vector Regression to the out-of-sample data, the effectiveness inverts. Performance degrades at a statistically significant level once you implement on investments that weren’t part of your training data.
Compustat Large Stocks Universe, 1963-2016
This doesn’t mean that all weighted or machine learning models are broken, rather that complex model construction comes with the risk of overfitting to the data and can dilute the edge of factors. Overfitting is not intentional, but a by-product of having dedicated research resources that are constantly looking for ways to improve upon their process. When evaluating the factor landscape, understand the model used to construct the seemingly similar themes of Value, Momentum or Quality. Complexity in itself is not an edge for performance, and makes the process less transparent to investors creating a “black box” from the density of mathematics. Simple models are more intuitive and likely to hold up in the true out-of-sample dataset, the future.
Multifactor ETFs have a lot of moving parts: the definition of factors, the construction process of building investment themes, as well as the portfolio construction techniques. Market-capitalization ETFs are very straightforward in comparison. Different products use broad, similar universes and weight on a single factor. And market capitalization has one of the most common definitions used for investing: shares outstanding multiplied by the price per share. The result is that different products by different managers have extremely similar results, and these products can be substitutes for one another.
The following two tables show the 2016 returns for three of the most popular market cap ETFs: the SPDR® S&P 500 ETF (SPY), the iShares Russell 1000 ETF (IWB) and the Vanguard S&P 500 ETF (VOO). These are widely held, and as of December 30th, 2016 together have almost $300 billion in assets. For 2016, the returns of these three ETFs are within 17bps of each other. When looking at the annualized daily tracking error for the year, we can see that they track one another very closely. Looking at these returns, it makes sense that the key selection criteria between the funds would be based on the lowest fee.
For a comparison, we can examine four multifactor ETFs that were launched in 2015: iShares Edge MSCI Multifactor USA ETF (LRGF), the SPDR® MSCI USA StrategicFactorsTM ETF (QUS), the Goldman Sachs ActiveBeta U.S. Large Cap Equity ETF (GSLC) and the JP Morgan Diversified Return U.S. Equity ETF (JPUS). Each fund uses a broad large cap universe, and then selects or weights stocks based on a combination of Factor themes: Value, Momentum and Quality metrics. At first glance, it looks like these should be very similar with one another.
Each fund is based on an index, which consists of a publicly stated methodology for how the indexes are constructed. When digging through the construction methodologies, you start seeing that different factors are used in building these themes. The only common Value metric used across all four is Book-to-Price. Two funds do use Sales to Price, but otherwise each fund is using one or two metrics unique to their competitors. QUS does not include momentum, but the other three funds use different expressions of momentum, with two conditionalizing on volatility. The most common Quality metric is Return on Equity, used in three funds, followed by Debt-to-Equity is used in two. Even though most of these funds use the equally-weighted approach in building their investment themes of Value, Momentum and Quality, because of the different inputs, the stock selection will be very different.
These different rankings are then utilized for stock selection and weighting in different portfolio construction techniques. When comparing holdings as of December 30th, 2016, the breadth of securities held range anywhere from 139 to 614 stocks in the fund. Maximum weights range from 3.3% to 0.6%, with the top 25 securities accounting from 43% to 14% of the total assets. They each use different techniques and risk models with unique constraints to shape weightings, leading to widely different portfolios. Looking at these four funds, as well as the SPY S&P 500 fund, they can have higher active share with each other than they do with the overall market.
These differences in signal, construction and holdings leads to very different investment results. When comparing the results for 2016, the best fund had a return of 12.96% while the worst returned 8.73%, a return gap of 423bps for the year. Also, when looking at the daily tracking error between the products, they generate a wider difference of returns with each other than they do with the market.
Keep in perspective that this is a single year. Low performance in 2016 is not an indictment of GSLC; it’s most likely that GSLC was caught in the underperformance of volatility given that it focuses on low volatility names in both its Volatility and Momentum ActiveBeta® Indexes. To confirm that, you would want to run the holdings through a factor attribution framework.
The central point is that even though these four funds look very similar, they generate very different results. Factor products that generate several hundreds of basis points of difference in a single year are not a commoditized product, and should not be chosen for investment in because of a few basis points in fees. Cost leadership is the key feature for generic market-capitalization weighted schemes, but product differentiation and focus in the context of fees should be the reasons for choosing multifactor products.
There is significant edge in how factor signals are constructed. The difficulty is creating transparency around this edge for investors. Complexity of stock selection and construction methodology decrease transparency, almost as much as active quantitative managers that create a “black box” around their stock ranking methodologies. This leaves investors at a disadvantage on trying to differentiate between quantitative products. This inability to differentiate is why price wars are starting between products that have strong differences in features and results.
Investors need education on this differentiation so they’re not selecting only on the lowest fees. Large institutional and investment consultant manager selection groups will have the difficulty of adding top-tier quantitative investment staff to help with this differentiation. Smaller groups and individual investors will have to advance their own understanding of how quantitative products are constructed. For the entire range factor investors, it will help to build trusted partnerships with quantitative asset managers willing to share insights and help understand the factor investing landscape.