The Benchmarking Game

July 01, 2011, 1:00 a.m. EDT 7 Min Read

With the growing number of investment tools, we have many options when choosing a benchmark. Yet the most appropriate choice usually isn't obvious for two reasons: It's rare that portfolios match one index substantially and they also change over time.

A large part of my work is creating custom benchmarks for foundations and trusts. By the time I'm done, I almost always find that both my clients and their money managers are surprised at the results. A key lesson is that the ubiquitous comparisons to the S&P 500 are often a disservice to your clients — if not dishonest. It's important to understand that, akin to other areas of investing, the choice of benchmarks is influenced by emotion and bias.

EVERYBODY'S ABOVE AVERAGE

Like the vast majority of drivers, I think I'm above average. You may have heard of the "Lake Wobegon" effect, the human tendency to overestimate ourselves. It's named for Garrison Keillor's fictional town where "all the women are strong, all the men are good looking and all the children are above average." I often ask audiences if their driving is above average and get a big show of hands. I next ask if they have below-average ethics. Occasionally, someone raises his hand.

One reason we can fool ourselves that we're above average is by using different standards. I think I'm an excellent driver because my benchmark is saving time, getting from point A to point B as fast as humanly possible. My wife's standard is getting from point A to point B while obeying all laws, such as the speed limit. My benchmark may be wrong, but I'm unlikely to admit it.

In that regard, consider a hypothetical U.S. stock portfolio managed by a firm named Alpha Dog that returned 15.5% after fees in 2010. Since the S&P 500 returned 12.8%, Alpha Dog is touting its stock picking abilities, claiming it added 2.7 percentage points of alpha.

The first caveat: The S&P 500 Index strips out dividends. The total return of the stocks in the index, with dividends, was 15.1%. Alpha Dog's returns typically include dividends, so it actually beat the S&P 500 by less than half a point.

Next question: Does Alpha Dog's portfolio include any small-cap or mid-cap stocks? In 2010, the Vanguard Total Stock Market ETF earned 17.5%, a bit less than the 17.9% the Wilshire 5000 turned in. As is often the case, small- and mid-cap stocks bested the large-caps, driving the total U.S. market above the S&P 500 total return. In 2010, mid-cap U.S. stocks earned 25.7% and small-cap stocks earned 28.1%. If the Alpha Dog portfolio has some small- and mid-cap stocks, its return of 15.5% was a full two points below the best benchmark.

Returns could be even worse if Alpha Dog happens to be solely a small- and mid-cap manager. If its fund holds 60% mid-cap and 40% small-cap stocks, the weighted average benchmark performance comes to 26.7% — meaning Alpha Dog underperformed by 11.2 points. Time to fire Alpha Dog!

APPLES TO ORANGES

In many cases, when managers compare their performance to the S&P 500 Index, they're not just comparing apples to oranges. They're comparing apples to parts of oranges. When I ask managers why they're using this ubiquitous benchmark, the answer is that it's the industry standard. Everyone does it. But the fact that everyone uses a misleading benchmark doesn't make it right. Not surprisingly, benchmark returns vary widely (see "Stiff Competition" above).

It's human to pick a benchmark that will make us look good, notes Duke behavioral economist Dan Ariely, author of The Upside of Irrationality: The Unexpected Benefits of Defying Logic at Work and at Home. However, he adds, "Using the raw S&P 500 Index to benchmark total returns enters the realm of being evil, since it can only be misleading."

I sought insight from behavioral finance expert Meir Statman, a visiting professor at Tilburg University in the Netherlands and author of What Investors Really Want: Know What Drives Investor Behavior and Make Smarter Financial Decisions. I made the point that I break speed limits, like many people do, but he didn't think the comparison was apt. "You must be kidding," he said. "Exceeding the speed limit? How about drunk driving?"

Indeed, when I politely point out to managers that the S&P 500 Index may not be the correct yardstick, I typically get responses varying from irritation to outrage. Their reaction might be based on the perception that I'm not questioning their benchmark choices, but their integrity.

According to Matt Hougan, president of ETF Analytics at Index Universe, money managers pick the easiest-to-beat benchmark that seems reasonable to their clients - usually the S&P 500 or Russell 2000. "We'd see a lot more people benchmarked to the MSCI All Country World Investable Market Index total return if they were honest about it," he says. Both Ariely and Statman believe that to avoid conflicts of interest, someone other than the money manager or the person choosing the money managers should be selecting the benchmarks.

PRACTICAL BENCHMARKING

One approach, suggested by investment manager and author William Bernstein, is to create benchmarks based on beta, portfolio size and value versus growth, commonly known as the Fama-French Three Factor Model. "Why stop at three factors?" Statman asks.

We could also take into account momentum, liquidity and even prestige. In my practice, I start with the broadest benchmarks of basic asset classes. Rather than use a theoretical index, I select broad index funds as a practical, after-fee, passive alternative portfolio. I rely heavily on Vanguard: Total Stock Market, Total Bond Market, REIT, Precious Metals and Mining (which is not an index fund, but has low costs and low turnover), and Money Market as well as the FTSE All World Ex-U.S. Index. Vanguard has the broadest funds and the longest history, which yields more data.

Using any hindsight in weighting the benchmarks creates a bias against the portfolio. I typically use the average portfolio weighting over the longest period for which my client has data. For example, if my client has four years of data, I might take the average weighting as of June 30 each year and take the average of those four years. I don't stick to calendar years unless the client has more detail on their Dec. 31 statements.

It turns out that data from both TD Ameritrade and Charles Schwab indicates that most advisors were invested heavily in stocks going into 2008, but very conservatively invested after the plunge. In other words, most advisors demonstrated extremely poor market timing. Since I'm using a constant average, advisors who made those moves do poorly against the benchmark, which I rebalance annually.

A moderately aggressive portfolio that averaged 40% U.S. stocks, 20% international stocks and 40% bonds should be 11.9% higher than the 2007 year-end pre-crash market close, as of May 31 (see "Better Benchmarks chart,"). This tends to shock investors and planners alike.

Admittedly, this benchmark methodology is not without flaws. For example, the benchmark bond portfolio follows the Barclays Capital Aggregate Bond Index, which is made up of 71% U.S. government bonds and 29% investment-grade corporate bonds, with an overall duration of 5.1 years. If you buy lower-quality or longer-duration bonds, you'll beat this index easily.

When I perform independent benchmarking, I almost always get two results:

* The portfolio badly underperforms the weighted average benchmarks. The problem is both stock selection and poor market timing

* The actual portfolios are far riskier than my benchmarks. This means that, if some of the other factors Statman suggested were included in the benchmark, the portfolio would do even worse by comparison.

As I said, everyone is surprised — both clients and money managers. One manager had bragged about beating the market for 14 out of the last 15 years, implying the foundation whose money he managed should be thrilled. Actually, he badly lagged the benchmark for at least the past few years. Eventually, the foundation replaced him with the benchmark index funds.

My approach isn't perfect, but it's so simple it's difficult to dispute. Usually my clients fire the money managers, though I occasionally get an "Oh, you don't understand" response. But when the client is asking me for an independent benchmark, they are already somewhat suspicious.

William Sharpe used simple arithmetic to prove that alpha had to be a zero sum game, before costs. This remains true everywhere - except in Lake Wobegon.

Allan S. Roth, founder of Wealth Logic in Colorado Springs, Colo., writes the Irrational Investor column for CBS MoneyWatch .com and is an adjunct faculty member at Colorado College and the University of Denver.