Warren Buffett, China Syndrome, and how to $p$-hack a rawdata plot
(Code available at harningle/useful-scripts)
An extremely common and powerful way to motivate (or distort) a story is by (cherry picking) stylised facts. And this very post is motivated by a chart I saw recently on Twitter, comparing Warren Buffett’s Berkshire Hathaway and S&P 500. I was a bit shocked: over the past 20 years, Buffett underperformed S&P 500.
How does the “rawdata” look like
First of all, I try to collect the data behind this chart above. Berkshire Hathaway website has all we need. I plot the gains if we invested 1 dollar in Berkshire and S&P 500 back in 1965, when Berkshire was founded.
Scales of Axes Matter a Lot, Visually. Panel (A) shows the vanilla line chart, and Berkshire completely destroys S&P 500. However, simply changing $y$-axis to $\log$ scale can easily “reduce” the performance gap between the two in Panel (B). This trick is very common in econ papers, e.g. showing bar charts with $y$-axis not starting from $0$, stretching $y$-axis to make make coefficient plots nicer, transforming $x$-axis into ticks with unequal intervals etc.
Cherry pick the sample period
Just to make sure there is no factual error, let’s zoom into 2004-2023, and Panel (A) below gets effectively the same chart as that on Twitter. So no factual errors; the graph on Twitter does faithfully plot the original rawdata. More interestingly, I find many time spans where S&P 500 was more profitable than Berkshire.
(A) 2004-2023
(B) 2023 full year
Source: Buffett's letter to shareholders of Berkshire Hathaway Inc., 2023, p.17, Yahoo Finance (S&P 500, Berkshire Hathaway)
Even Rawdata Plots Are Carefully Picked. Unlike Matray constant, cherry picking can be totally legal and very natural. If you read
“we look at the performance of xxx over the past 20 years”, or
“last year, xxx outperformed yyy”,
do you think these lines are natural or will you have any doubt? Keep in mind that everything we see are cherry picked. Cherry picking and $p$-hacking are not necessarily stilted, and the authors always try their best to make them flow as natural as possible, so as to cheat the referees and editors and get the paper published.
The China Syndrome?
Cherry picking the sample period reminds me of the seminal paper by Autor et al. (AER 2013). They basically blame China for unemployment in the US. The first figure in their paper indicates a strong negative correlation between US’s imports from China and the employment rate in US. I successfully reproduce the chart (with some difference) from scratch, i.e. not using their replication package.1
(A) Figure 1 in Autor et al. (AER 2013)
(B) our replication
Source: U.S. Bureau of Economic Analysis, U.S. Bureau of Labor Statistics, U.S. Census Bureau (Civilian Labor Force Level [CLF16OV], All Employees, Manufacturing [MANEMP], Gross Domestic Product [GDP], Imports of Goods and Services [IMPGS], Exports of Goods and Services [EXPGS], U.S. Imports of Goods by Customs Basis from China [IMPCH]), retrieved from FRED, Federal Reserve Bank of St. Louis
Notes: U.S. Imports of Goods by Customs Basis from China [IMPCH] is not seasonally adjusted, while other series are. I take a dirty and quick 12-month moving average to “remove” the seasonality, and then aggregate it to quarterly level to match the frequency of other series. The notes apply to figures below as well.
However, if we look at the entire time series of US employment, the rise in unemployment seems to have nothing to do with China; the downward trend has been there since day 1…
Source: Same as above
Now the full picture. If I plot the entire time series of both US unemployment and import from China, do you still think there is any correlation between them?
Source: Same as above
That being said, I still like Autor et al. (AER 2013) very much. Wang et al. (NBER Working Paper 2018) is also worth reading: after taking supply chain/general equilibrium into consideration, and imports from China actually boost up employment in US.
-
I think I made a mistake somewhere in my replication. The trend/slope of my import penetration blue line is almost identical to the original paper, but the level is 10x smaller than theirs. Maybe a unit conversion mistake on my side. But I wasn’t able to figure out where. ↩