top of page

# A Few Thoughts on Randomness, Probability, and Social Science

Practically every student who has ever taken a course in regression analysis and statistics in general will recall that phenomena are conceptualized to have systematic as well as stochastic variation. As a running example, let us consider a model we might build to explain the causes of democratization. We might plausibly claim that whether or not a country democratizes in a given year is the result of the country’s wealth, population, and average level of education. These three explanatory variables would then explain the systematic variation in the outcome of interest, i. e. whether or not a country democratizes. Yet every model will also contain an error term which captures stochastic variation – the inherent randomness with which the social world is imbued. In our example, we would not have the ambition to assert that our set of explanatory variables (wealth, population, education) captures all the variation in democratization and that is why democratization itself is said to be a random variable. It can be explained but not perfectly and not all the time.

A curious student, however, will soon start to question the notion of randomness and perhaps ponder the meaning of probability itself. A reasonable stream of thought may proceed as follows: When a coin is tossed, there is a fifty per cent probability that we will observe heads and a fifty per cent probability that we will see tails. Indeed, if we toss a coin one thousand times, we are likely going to see about five hundred heads and five hundred tails. But if we toss a coin, every one outcome is random and cannot be predicted. Yet is this true?, the curious student will suggest. Can the outcome of a single coin toss be truly random even if we knew of every single force that influences the spinning coin? If every effect has a cause, and scientists ordinarily assume it does, then if we knew everything there is to know about a coin and the environment in which it is spinning, should we not know quite exactly whether we will observe heads or tails? Does randomness not describe a situation in which our information is limited and our prediction imperfect? Such thoughts sound very reasonable and up to this day underpin the philosophical foundations of science. King, Keohane, and Verba summarize the two broad views of randomness succinctly:

Perspective 1: A Probabilistic World. Random variation exists in nature and the social and political worlds and can never be eliminated. Even if we measured all variables without error, collected a census (rather than only a sample) of data, and included every conceivable explanatory variable, our analyses would still never generate perfect predictions. A researcher can divide the world into apparently systematic and apparently nonsystematic components and often improve on predictions, but nothing a researcher does to analyze data can have any effect on reducing the fundamental amount of nonsystematic variation existing in various parts of the empirical world.

Perspective 2: A Deterministic World. Random variation is only that portion of the world for which we have no explanation. The division between systematic and stochastic variation is imposed by the analyst and depends on what explanatory variables are available and included in the analysis. Given the right explanatory variables, the world is entirely predictable.[1]

Although King, Keohane, and Verba make it clear that most scholars will fall somewhere in between these two extremes, it seems to me that they define the philosophical space where one can fall quite well. The curious student from the previous discussion would naturally tend towards the second perspective. Yet, perspective 1 seems to dominate social science and to think about the social world in purely deterministic terms creates problems of its own because such thinking, among other things, denies the individual her free will and otherwise does not conform to the dynamic nature we observe in societies. The looming question thus remains: who is right, who is wrong, and is there a way of reconciling the two perspectives?

I believe there is. It seems to me that although perspective 1 is what a researcher without supernatural powers (never covering the relevant variables fully) will end up using, perspective 2, when stated with little more precision and little less dogmatism, is likely closer to the truth. Even a researcher who accepts the second, deterministic view of the world will have to admit that he is not God and cannot possibly survey all the relevant variables that would enable him to predict the outcome on every occasion, never failing. He, too, will include an error term in his models, adding that with a lot more luck, the need for such error term would be obviated. Those who adhere to the first, probabilistic view of reality will perhaps feel vindicated but they, too, will face unanswered questions. For starters, they will have to, in one form or another, admit that things are happening at least in part without apparent causes. What is more, they will have to divide the world into “apparently systematic” and “apparently nonsystematic” components and each time they realize that their division was flawed (which will happen whenever they improve their models), they will have to justify why it is that one day, the apparently nonsystematic component cannot be reduced an inch further.

I reason that the crux of the matter is in our scientific ambition to build generalizable theories. A useful theory of social science will necessarily assume at least a certain amount of homogeneity between cases, over time, or both. Because the assumption of such homogeneity is unlikely to hold, we observe randomness. In fact, it would seem that the extent of randomness in our theories is equal to the extent to which the social world deviates from the said assumption of homogeneity. Let me illustrate what I have in mind with the running example of democratization. In order to develop a theory of democratization, we first have to assume, either due to evidence or belief, that there is such a thing as democratization. In other words, we have to assume that what is happening to countries that move from autocratic (or other) rule to democracy is fundamentally similar and comparable across cases, time, or both. This concerns conceptual homogeneity across cases. Often, too, we will need to assume that a country, say the United States in 1989, is fundamentally similar to its later version, say the United States in 1990. Yet because it is more likely that democratization in the Czech Republic is different from democratization in Namibia, we are not talking about democratization, but democratizations. Furthermore, democratization in the Czech Republic in 1989 could be different from democratization in that same country a year later.

Ultimately, one reaches the conclusion that likely, no two complex processes in the social world are alike. Indeed, to the extent that they are, we can build useful theories that help us abstract and learn about the world in an inferential manner. In this sense, randomness is the price to pay for the luxury of building theories, not necessarily something that can never be eliminated.

[1]KING, Gary, Robert O. KEOHANE a Sidney VERBA. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton, N.J.: Princeton University Press, c1994. ISBN 978-0-691-03471-3., p. 59