Last modified: Tue Dec 5 11:21:25 2000

The True Relationship Between Votes, County Size, and County Composition

Go straight to the math

My name is Jonah B. Gelbach, and I am an Assistant Professor of Economics at the University of Maryland, College Park. If you're looking for my usual web page, it may be found here. If you are looking for a set of links I posted about the election, you can find those at this location.

For a list of myths about the election, go to the MoveOn Website.

If you are looking for some humor in this absurd time, go to this BET site and try to vote (you'll need shockwave flash).

Note: On this page, I do not purport to tell you how many Buchanan votes were erroneous. I also do not purport to demonstrate the real-world impossibility of the Buchanan vote total in Palm Beach. I may get around to these issues later. In the meantime, a number of people have written to me suggesting that my analysis is incorrect because I seem to have implied otherwise. So let me be clear. What I do do is:



Let me summarize the issues as well as what can be demonstrated:
  1. Rob Shimer has questioned the accuracy of the widespread report that Reform Party candidate Pat Buchanan received disproportionately many votes in Palm Beach County. His initial argument concerned spurious correlation, which occurs when a variable omitted from the analysis is systematically related to both dependent and independent variables. In my earlier page, I argued that this critique does not hold water. The reason is that the Palm Beach outlier effect continues to exist even when Buchanan's votes are plotted against County size. County size cannot be said to be omitted from the analysis if it is on the x-axis.

  2. Professor Shimer's web page no longer refers to spurious correlation. Rather, he makes a number of arguments related to 2 other issues:

    1. Uncertainty regarding the proper functional form of the relationship between Buchanan votes and any other candidate's votes (which also implies uncertainty regarding the relationship with County size). Prof. Shimer argues that this issue is important because of what he refers to as the small number of counties, and particularly the small number of large counties, in Florida.

    2. Non-normality of the residuals in whatever relationship is correct. This issue is potentially important because statistical inference regarding the number of votes that mistakenly went to Buchanan rather than Gore requires knowledge of the distribution of residuals.


  3. I believe that Prof. Shimer is wrong on the first point above, and I believe the second point to be irrelevant, given that the cumulative distribution of votes for a candidate is known to be binomial. In fact, the normal distribution is likely a very good approximation to the binomial for a county as large as Palm Beach (over 431,621 votes), though the generally low probability of a Buchanan vote complicates the use of the normal approximation. A detailed and somewhat technical discussion of these issues may be found as an attachment in your choice of HTML, PostScript, DVI, or PDF format. But let me summarize the argument:

    1. In a very deep sense, the true relationship between County size and the number of votes one would expect for Pat Buchanan simply has to be linear. The intuition is very simple: whatever the true expected number of Buchanan votes in a county of a given size and composition, if we added another County of exactly the same size and composition, we would expect to get twice as many Buchanan votes (actually, this tells us only that the true relationship is linearly homogeneous of degree one; the above-referenced note proves the linearity of the relationship). Simple as it is, if you don't believe this argument, then you can't believe in the laws of conditional expectation.

      Does that mean that all counties do have the same size and composition? Of course not! But the point is that any systematic errors in exploring a linear relationship between candidate votes and county size have to do with omitted variables, in particular, the number of people of the relevant kind living in a county (see the attached note referenced above). Potential errors have absolutely nothing to do with County size per se. That is, County size can cause statistical problems only insofar as it is correlated with other variables in the analysis. One might say that this is the same thing as Prof. Shimer is saying, but one would be wrong: whether or not County size is correlated with other characteristics is an empirical matter, not a theoretical statistical one. The point here is that size does not matter, correlation does. I make this argument in detail in the attached note.

    2. Unlike the other analyses I have seen, the analysis presented here relies directly on statistical theory (Greg Adams and Chris Fastnow do use levels-levels, though they don't derive the theoretical justification). Doing so allows me to demonstrate that the true relationship is linear.

    3. As a corollary, it follows that the ad hoc relationships estimated and plotted by Prof. Shimer, as well as others, are mis-specified. The relationships they posit are inconsistent with statistical theory.

    4. The observation that Palm Beach's status as an outlier is reduced when one plots Buchanan's vote share against either of the other candidates' shares is perfectly consistent with the fact that Palm Beach is an outlier. In fact, moving to shares necessarily makes Palm Beach look like less of an outlier -- precisely because of Palm Beach's large size! The reason is heteroskedasticity related to county size: larger counties must have compressed distributions for the deviation of realized shares from their conditional mean. Reports to the contrary are not only greatly exaggerated, they are exactly inverted.

    5. A very rough analysis suggests that the probability that Pat Buchanan could have gotten more than about 1300 votes is -- literally -- 0. Here is a graph plotting the probability that Buchanan's vote total is at least a given number (Stata code and data).

      In response to comments, I'd like to note for clarification's sake that the analysis summarized in this graph assumes homogeneity of preferences, which is indefensible in practice. The point of assuming homogeneity is to demonstrate that PB can be shown not to be an extreme outlier only if it really is a Buchanan stronghold.

      Evidence on previous elections discussed on the web page of Professor Christopher Carroll of Johns Hopkins University clearly suggests otherwise. Hence to the extent that this analysis ignores important heterogeneity issues, it is probably too generous to Buchanan, and hence methodologically conservative.


  4. That said, here is the very simple graph I presented earlier:



    Quite clearly, Palm Beach is an extreme outlier. And one can hardly argue that this is due to spurious correlation with county size, since County size is on the x-axis!

    Side note: is Palm Beach County a Buchanan "stronghold"?

    Further side note: Dade and Broward look like outliers, apparently casting doubt on the linearity of the relationship (though further strengthening the PB anomaly conclusion), because they are heavily Democratic. They are Buchanan "weakholds". This does not invalidate linearity, it just says more variables (like the number of likely Buchanan voters) would be needed to get a correct statistical relationship.

    Lest anyone believe that Palm Beach is just weird in general, consider the following graphs of Bush votes against County size, and Gore votes against county size (note that they are mirror images of each other almost by construction):

    There seems little out of order in these graphs, and Palm Beach is certainly no great outlier.



As I prove formally in the attachment, this situation is one in which the simple figures may be the most revealing, and in any case the more complicated approaches cannot be defended using statistical theory.

Professor Shimer previously wrote on his web page that:

In my opinion, academicians should be very careful about making bold statements at this time, and be sure that the statistics really back them up.

I agree wholeheartedly. I also wish to note by way of summary that arguments regarding county size and statistical artefacts are impossible to back up with statistics, because they simply aren't true.

For further discussion:



Please send comments or questions to me at gelbach@glue.umd.edu