This will be a collection of hypothetical lectures that I might have delivered over the course of my academic career, but didn’t. The goal of this course of lectures is to introduce a broad array of tools, or ideas, or weapons for attacking reasoning problems, taking advantage of a broad range of disciplines. These are meant to be introductory, readily understood by intelligent laypeople who have never studied those disciplines and representing general-purspose methods that might become available to anyone who does study those disciplines at an undergraduate level. So, this collection is envisioned as a kind of Swiss-army knife for your brain. While that is my intention, I do not pretend to cover all the major disciplines, but emphasize those which have had a substantial impact on my intellectual life.
I have taken inspiration from two prolific and excellent writers of articles for Scientific American, A.K. Dewdney and Martin Gardner. In partial consequence of their inspiration, these lectures are somewhat loosely connected; they are intended to largely be intelligible independently of one another, although cross-references will guide the reader through some kinds of dependencies. While this is not intended to be scholarly in the sense of detailing every historical line of thought behind these lectures, or attributing all details to their originators, I do indicate where readers might turn for additional information on these ideas.
The top-level topics I am covering (in tentative order) include: Philosophy, Bayesian Reasoning, Argumentation, Mathematics and Computer Science, Physical Thinking, Modeling and Simulation, Evolution Theory, Information, Ethics, Politics, Cognition and Inference. Posts will be “collected” using the tag #ReasoningWell.
Here I put together some of the key arguments for some of the important issues concerning the Covid-19 pandemic (alternatively, the SARS-CoV-2 pandemic, since that is the virus causing Covid-19). (Nota Bene: Much of this was written well before the date of publication. Rather than update the content, which would take some time, I now fill it out and publish as is, since I believe it still makes a contribution.)
The arguments themselves are mostly quite simple. The disagreements about the issues largely lie in disagreements about what the underlying facts are, with covid deniers mostly using unreliable sources of information (I’ve had unsourced youtube videos offered as scientific evidence) or misunderstanding statistical reasoning or scientific methods. The fundamental solution, or mental repair work, has to do with learning methods of critical reasoning, properly checking sources, learning scientific and statistical methods, etc. I will point out specific problems of this kind, but readers may also wish to consult general guides to such matters. (I had written one, which Extinction Rebellion deleted without making any backup; I will recreate it someday.)
Some active commentators think that critical reasoning means rejecting anything “the authorities” might have to say, calling this “healthy skepticism”. In fact, it is unhealthy skepticism. Critical reasoning involves testing relevant propositions, neither rejecting them because you don’t like the source nor accepting them because you do. To be sure, critical reasoning is also compatible with this kind of out-of-hand acceptance or rejection on the grounds of time and effort. No one can become an expert in every scientific field, so that’s why we have experts and that’s why sciences and other social activities establish vetting and review processes to test and publicize their own standards for reliability. (If you’d like to learn about critical reasoning, the Stanford Encyclopedia of Philosophy article “Critical Thinking” is a good place to start.)
For my part, I give proper references, unlike conspiracy theorists.
It doesn’t help that both the CDC and WHO have lost a good deal of credibility on Covid-19. The CDC appears to have been captured by the Trump administration and is now taking political orders instead of (or more exactly, in addition to) promoting science-based policy. There are, of course, many good scientists remaining in the CDC, but their bosses are owned politically, with the result that pronouncements by CDC are more suspect than ever before. (See also CDC Director Redfield’s letter to governors of 17 August 2020, effectively announcing vaccines will be considered safe prior to Phase 3 trials.) The WHO depends upon financial support from member nations, with the result that their pronouncements are subject to influence by those nations. The silver lining to the US’s withdrawal from the WHO is that the US no longer has such influence.
Early doubts about masks expressed by US and WHO health authorities were partially motivated by political aims rather than science, such as Dr Anthony Fauci’s publicly stated goal of reserving better masks for health care workers. Unfortunately, he actually said, falsely, that there was no scientific evidence supporting the public use of masks. One of the major principles taught in public health education is to tell the public the truth: losing the public’s confidence is one of the sure-fire ways of losing the public health war. Dr Fauci violated the public trust. That does not, of course, mean that his subsequent statements are also false. For the most part, they appear to be accurate. Similarly, the WHO publicly repeated messages from the Chinese government uncritically, in particular claiming that there was no evidence that covid-19 is transmitted between humans and also claiming there was no evidence that covid-19 can be transmitted by pre- or asymptomatic people (e.g., “WHO Comments Breed Confusion Over Asymptomatic Spread of COVID-19“). Both claims were known to be false at the time. The WHO has, of course, retracted those comments, but only after much damage was done.
Where I reference the CDC or WHO below, I have found their comments to be well sourced in the case at hand; the reader can always follow those sources. I now briefly treat some of the more contentious public health claims about Covid-19.
Common flus have R0s ranging from 0.9 to 2.1 (Coburn, Wagner and Blower, 2009), which, while lower than that for SARS-CoV-2, is generally enough to cause problems. The main relevant differences between these flus and Covid-19 are: there is considerable partial immunity to influenza through prior exposure in the population; there are vaccines to help protect vulnerable subpopulations; the virulence, in both mortality and morbidity, is far less (multiple studies support an estimate of around 0.5% for the infectious fatality rate of Covid-19; e.g., the meta-analysis by Meyerowitz-Katz and Merone, 2020).
Much of the outcry over public health measures is fueled by a denial that the mortality rates for Covid-19 is as high as some have claimed. The very first point to make is that this claim, even if true, would be insufficient to make their case that the common health measures, including wearing masks, are unnecessary. It entirely ignores the very large morbidity of the disease. To be sure, we do not yet know the long-term damage this disease does to survivors. But the simple-minded assumption that asymptomatic, or subclinical, victims bear no consequences (e.g., Trump claiming children are virtually immune) is, at best, willful ignorance. Instead of that, the growing weight of the evidence is that subclinical victims suffer significant health damage (see, e.g., “Asymptomatic COVID: Silent, but Maybe Not Harmless“,
Schools should be open since children do not suffer significant harm from Covid-19
A recent BMJ study (27 Aug 2020) reinforces others showing that children and young people have less severe acute covid-19 than adults. Some early reports indicated that very few spreading events had been traced to schools; however, that has less evidentiary value than it might seem, since early on many schools were shut, and so could not have been sources of spreading events. Nevertheless, studies have shown that: when infected, children carry viral loads comparable to adults (Jones et al., 2020); children appear to spread the disease and have been the source of superspreader events (Kelly, 2020). Furthermore, the studies showing a high morbidity load for Covid-19 sufferers, including those with few or no symptoms, do not bode well for the future health of infected children. The disease affects every major organ in the body in many cases (e.g., Robba et al., 2020). Imposing those burdens on the children, and on their families and communities, is not a step to be taken lightly. Of course, as with all public health measures, the choice is not automatic; there must be a weighing up of benefits and harms. If the testing and contact tracing regime in a region or country is sufficiently robust, then schools may well be the first institution worth opening up.
Economics trumps health
It is widely and loudly argued that the health of the economy, affecting everyone and especially the poor, should come before the health of the few and, in particular, the health of the old and frail. The welfare of the 0.5% should not be allowed to dictate the lives of the remaining 99.5%.
This argument is fundamentally simply ignorant. The first thing it ignores is the very heavy morbidity load imposed on society by unchecked Covid-19. Subclinical sufferers may continue to work, but only by way of spreading the disease to coworkers. Assuming that’s not what “open economy” advocates have in mind, then subclinical victims will be out of the economy for the duration of their infectiousness only, one or two weeks. That’s around 40-50% of those infected. The rest will be out for the duration of their symptoms, ranging from a couple of weeks to many months. And there’s a very large tail of “long covid” patients who are incapacitated for at least months, perhaps years (Marshall, 2020). The “open economy” option implies allowing the spread of the disease, its consequent damage to the health of a very large percentage of the population, resulting in severe economic disruption for at least the duration of the pandemic.
The alternative view, one endorsed by many economists, is that caring for the health and well-being of society is the first step to sustaining, or rebuilding, the economy. A simulation study of the economics of pandemics by Barrett et al. (2011) directly supports this view. So too does the history of the 1918 Spanish Flu: a study of US cities shows that those which had more aggressive public health interventions, including masks and lockdowns, performed better economically (Hatchett, Mecher and Lipsitch, 2007).
Wearing masks is an individual choice, so the state has no right to mandate them
Assuming masks are effective in slowing a deadly pandemic, and that a deadly pandemic exists, this amounts to the claim that public health interests cannot override individual freedoms. Extreme libertarians might be enamored of such an argument, although libertarianism traditionally does not endorse the right to harm others, which violating mask mandates in these circumstances certainly can do. For example, the Stanford Encyclopedia of Philosophy article on Libertarianism states:
While people can be justifiably forced to do certain things (most obviously, to refrain from violating the rights of others) they cannot be coerced to serve the overall good of society, or even their own personal good.
Infecting others with a deadly disease violates others’ rights, of course. There is no accepted principle that absolutely asserts public health rights over individual rights, or vice versa. Society as a whole, through its institutions and public opinion, must adjudicate particular cases. But the claim of some that their individual freedoms always trump public health orders is simply stupid.
Masks are ineffective
Of course, mandating masks is pointless, an arbitrary and unnecessary restriction of people’s choices, if they have no effect on the disease. However, we have known for around one hundred years that they are effective in slowing and reducing the spread of many respiratory diseases such as Covid-19. The history of the 1918 flu epidemic includes an interesting episode of the response in San Francisco (see also Anti-Mask League of San Francisco). The short version is that mask wearing was accepted initially, and the first wave of the flu was bad enough, but after relaxing the rules a second wave came, when resistance to masks was much greater. In partial result of that, the second wave was far more devastating.
More direct evidence has become available in the meantime. Respiratory diseases such as Covid-19 are spread in the first instance by air, through water droplets ranging from large to extremely small, the former generally being called “droplets” and the latter “aerosols”. There are notable differences between masks, with some being more effective than others. So, any claim that masks are helpful in reducing Covid-19 spread most likely is making some restricted claim about a subset of possible masks. Finding that, say, a shawl or balaclava doesn’t help does not negate the claim.
Most masks have been proven effective at inhibiting larger droplets spreading (see CDC’s Considerations for Wearing Masks)
UCSF has an overview report on the effectiveness of masks that is worth reading, “Still Confused About Masks? Here’s the Science Behind How Face Masks Prevent Coronavirus.” To be sure, their update, indicating that valved masks are ineffective is mistaken on multiple points. First, they (along with the CDC and various other health authorities) ignore the simple and obvious point that if you do effective “sink control”, eliminating transmission at the recipient end, then you eliminate transmission. It takes two to tango. Second, there is in fact no evidence that significant (infectious) amounts of SARS-CoV-2 escapes through the valves; this is possible, but the evidence is thin. (Here is an interesting Salon article on this subject.) On other matters, however, the UCSF report is solid, in my opinion.
Masks are dangerous
Granted that masks are effective, some have claimed that they are dangerous. The danger may well counterbalance, or overbalance, the benefits, so, if true, this would make existing mask advice and mandates suspect. On the face of it, the claim is absurd, since medical practitioners have been wearing masks without observed ill effect for over one hundred years. Beneath the face of it, the claim is still absurd. You can read this Fact Check put together by the BBC.
Coburn BJ, Wagner BG, Blower S. Modeling influenza epidemics and pandemics: insights into the future of swine flu (H1N1). BMC Med. 2009;7:30. Published 2009 Jun 22. doi:10.1186/1741-7015-7-30 https://pubmed.ncbi.nlm.nih.gov/19545404/
Hatchett, R. J., Mecher, C. E., & Lipsitch, M. (2007). Public health interventions and epidemic intensity during the 1918 influenza pandemic. Proceedings of the National Academy of Sciences, 104(18), 7582-7587. https://www.pnas.org/content/104/18/7582
Many politicians and media personalities continue to cast doubt on the idea that anthropogenic global warming (AGW) – the primary driver of current global climate change – could possibly be behind the growing frequency and severity of extreme weather events – the droughts, heatwaves, flooding, etc. that are every year breaking 100 year or greater historical records. This takes the form not just of a straightforward denial of climate change, but also of a more plausible denial of a connection between climate change and individual extreme events. Until ten or five years ago, many climate scientists themselves would have agreed with rejecting such a connection, and some journalists and politicians have followed them and continue following them, even when they have stopped leading anyone in that direction (see box below). Climate scientists have stopped agreeing with this, because in the meantime a new subdiscipline has been developed specifically for attributing extreme weather events to AGW or to natural variation, depending upon the specifics of the case. While it may suit the political preferences of some commentators to ignore this development, it is not in the general interest. Here I present a brief and simple introduction to the main ideas in current work on attributing individual events to global warming. (An even simpler introduction to attribution science, emphasizing legal liability, can be found in Colman, 2019.)
Climate versus Weather
It has become a commonplace to point out that weather is not climate: climate refers to a long-term pattern of weather, not individual events. Usually the point meant is that some hot, or cold, weather is not evidence for, or against, anthropogenic global warming or significant climate change. That, however, is not true. Long-term patterns influence short-term events, whether or not the short-term events are classified as “extreme”. As one of the original researchers on weather attribution put it:
In practice, all we can ever observe directly is weather, meaning the actual trajectory of the system over the climate attractor during a limited period of time. Hence we can never be sure, with finite observations and imperfect models, of what the climate is or how it is changing. (Allen, 2003)
This actually describes the relation between theories (or models, or simulations) and evidence in science quite generally. Claims about the state of the climate are theoretical, rather than observational. Theoretical claims cannot be directly observed to be true or false; but they do give rise to predictions whose probabilities can be calculated and whose outcomes can be observed. The probabilities of those outcomes provide support for and against our theories. There is always some uncertainty, but that pertaining to earth’s rotation around the sun, the disvalue of bleeding sick humans and the reality of AGW have been driven to near zero.
Certainly, larger and more frequent storms are one of the consequences that the climate models and climate scientists predict from global warming but you cannot attribute any particular storm to global warming, so let’s be quite clear about that. And the same scientists would agree with that. – Australian PM Malcolm Turnbull, 2016
It is problematic to directly attribute individual weather events, such as the current heatwave, to climate change because extreme weather events do occur as a part of natural climate variability. – Climate Change Minister Greg Combet, 2013
The only special difficulty in understanding the relation between climate and weather lies in the high degree of variability in the weather; discerning the signal buried within the stochastic noise is non-trivial (aka “the detection problem”), which is one reason why climate science and data analysis should be relied upon instead of lay persons’ “gut feels”. Denialists often want to play this distinction both ways: when the weather is excessively hot, variability means there is no evidence of AGW; when the weather is excessively cold, that means AGW is not real.
What matters is what the overall trends are, and the overall trends include increasing numbers of new high temperatures being set and decreasing numbers of new low temperatures being set at like locations and seasons, worldwide. For example, that ratio is 2:1 in the US from 2000-2010 (Climate Nexus, 2019). Or more generally, we see this in the continuing phenomenon of the latest ten years including nine of the 10 hottest years globally on record (NOAA “Global Climate Report 2018”).
The analogy with the arguments about tobacco and cancer is a strong one. For decades, tobacco companies claimed that since the connection between smoking and cancers is stochastic (probabilistic, uncertain), individual cases of cancer could never be attributed to smoking, so liability in individual cases could not be proven (aka “the attribution problem”). The tobacco companies lost that argument: specific means of causal attribution have been developed for smoking (e.g., “relative risk”, which is closely related to the methods discussed below for weather attribution; O’Keefe et al., 2018). Likewise, there are now accepted methods of attributing weather events to global warming, which I will describe below.
Rejecting the connection between weather and climate, aside from often being an act of hypocrisy, implies a rejection of the connection between evidence and theory: ultimately, it leads to a rejection of science and scientific method.
Weather Severity Is Increasing
Logically before attributing extreme weather to human activity (“attribution”) comes finding that extreme weather is occurring more frequently than is natural (“detection”). Denialism regarding AGW of course extends to denialism of such increasing frequency of weather extremes. There are two main kinds of evidence of the worsening of weather worldwide.
Direct evidence includes straightforward measurements of weather. For example, measurements of the worldwide average temperature anomalies (departures from the mean temperature over some range of years) themselves have the extreme feature of showing ever hotter years, as noted above (NOAA “Global Climate Report 2018”). Simple statistics will report many of these kinds of measurements as exceedingly unlikely on the “null hypothesis” that the climate isn’t changing. More dramatic evidence comes in the form of increased frequency and intensity of flooding, droughts, etc. (IPCC AR5 WG2 Technical Summary 2014, Section A-1). There is considerable natural variability in such extremes, meaning there is some uncertainty about some types of extreme weather. The NOAA, for example, refuses to commit to there being any increased frequency or intensity of tropical storms; however, many other cases of extreme weather are clear and undisputed by scientists, as we shall see.
Indirect evidence includes claims and costs associated with insuring businesses, private properties and lives around the world. While the population size and the size of economies around the world have been increasing along with CO2 in the atmosphere – resulting in increased insurance exposure – the actual costs of natural disasters have increased at a rate greater than the simple economic increase would explain (see Figure 1). In consequence, for example, “many insurers are throwing out decades of outdated weather actuarial data and hiring teams of in-house climatologists, computer scientists and statisticians to redesign their risk models.” (Hoffman, 2018).The excess increase in costs, i.e., that beyond the underlying increase in the value of infrastructure and goods, can be attributed to climate change, as can the excess increase (beyond inflation) in the rates charged by insurers.
Another category of indirect argument for the increasing severity of weather comes from the theory of anthropogenic global warming itself. AGW implies a long-term shift in weather as the world heats, which in turn implies a succession of “new normals” – more extreme weather becoming normal until even more extreme weather replaces that norm – and hence a greater frequency of extreme weather events from the point of view of the old normal. In other words, everything that supports AGW, from validated general circulation models (GCMs) to observations, supports a general case that a variety of weather extremes is growing in frequency, intensity or both.
Is Anthropogenic Global Warming Real?
So, AGW implies an increase in many kinds of extreme weather; hence evidence for AGW also amounts to evidence that increases in extreme weather are real. That raises the question of AGW and the evidence for it. This article isn’t the best place to address this issue, so I’d simply like to remind people of a few basic points, in case, for example, you’re talking with someone rational:
Skepticism and denialism are not the same. Skeptics test claims to knowledge; denialists deny them. No (living) philosophical skeptic, for example, would refuse to look around before attempting to cross a busy road.
Science lives and breathes by skeptical challenges to received opinions. That’s not the same as holding all scientific propositions in equal contempt. Our technical civilization – almost everything about it – was generated by applying established science. It is not activists who are hypocrites for using trains, the internet and cars to spread their message; the hypocrites are those who use the same technology, but deny the science behind that technology.
Denialism requires adopting the belief that thousands of scientists from around the world are conspiring together to perpetrate a lie upon the public. David Grimes has an interesting probabilistic analysis of the longevity of unrevealed conspiracies (in which insiders have not blabbed about it), estimating that a climate conspiracy of this kind would require about 400,000 participants and its probability of enduring beyond a year or two is essentially zero [Grimes, 2016]. The lack of an insider revealing such a conspiracy is compelling evidence that there is no such conspiracy, in other words.
The Detection of Extreme Weather
The first issue to consider here is what to count as extreme weather – effectively a “Detection Problem” of distinguishing the “signal” of climate change from the “noise” of natural variation. The usual answer is to identify some probability threshold such that a kind of event having that probability on the assumption of a “null hypothesis” of natural variation would count as extreme. Different researchers will identify different thresholds. We might take, for example, a 1% chance of occurrence in a time interval under “natural” conditions as a threshold (which is not quite the same as a 1-in-100 interval event, by the way). “Natural” here needs to mean the conditions which would prevail were AGW not happening; ordinarily the average pre-industrial climate is taken as describing those conditions, since the few hundred years since then is too short a time period for natural processes to have changed earth’s climate much, going on historical observations (chapter 4, Houghton, 2009). The cycle of ice ages works, for example, on periods of tens of thousands of years.
Of course, a one percent event will happen eventually. But the additional idea here, which I elaborate upon below, is to compare the probability of an event happening under the assumption of natural variation to its probability assuming anthropogenic global warming. The latter probability I will write P(E|AGW) – the probability of event E assuming that AGW is known to be true; the former I will write P(E|¬AGW) – the probability of E assuming that AGW is known to be false. These kinds of probabilities (of events given some hypothesis) are called likelihoods in statistics. The likelihood ratio of interest is P(E|¬AGW)/P(E|AGW); the extent to which this ratio falls short of 1 (assuming it does) is the extent to which the occurence of the extreme event supports the anthropogenic global warming hypothesis versus the alternative no warming (natural variation only) hypothesis. (The inverse ratio is also known as “relative risk” in, e.g., epidemiology, where analogous attribution studies are done.) A single such event may not make much of a difference to our opinion about global warming, but a glut of them, which is what we have seen over the decades, leaves adherence to a non-warming world hypothesis simply a manifestation of irrationality. As scientists are not, for the most part, irrational, that is exactly why the scientific consensus on global warming is so strong.
Varieties of Extreme Weather
There is a large variety of types of extreme weather which appear likely to have been the result of global warming. A recent IPCC study found the following changes at the global scale likely to very likely to have been caused by AGW: increases in the length and frequency of heat waves, increases in surface temperature extremes (both high and low), increased frequency of floods. They express low confidence in observed increases in the intensity of tropical cyclones – which does not mean that they don’t believe it, but that the evidence, while supporting the claim, is not sufficiently compelling. On the other hand, there is no evidence for increased frequency of cyclones (Seneviratne et al., 2017). They don’t address other extremes, but the frequency (return period) and intensity of droughts, increases in ocean extreme temperatures, and increases in mean land and ocean temperatures have elsewhere been attributed to AGW (some references below).
In addition to measurements of extreme events, there is some theoretical basis for predicting their greater occurrence. For example, changes to ocean temperatures, and especially ice melt changing the density of water in the Arctic, are known to affect ocean currents, which, depending upon the degree of change, will have likely affects on weather patterns (e.g., NOAA, 2019). Again, warmer air is well known to hold more water vapor, leading to larger precipitation events, resulting in more floods (Coumou and Ramstorf, 2012). Warmer water feeds cyclonic storms, likely increasing their intensity, if not their frequency (e.g., Zielinski, 2015).
Causal Attribution Theory
If we can agree that detection has occurred – that is, that weather extremes are increasing beyond what background variability would explain – then we need to move on to attribution, explaining that increase. There will always be some claiming that individual events that are “merely” probabilistically related to causes can never be explained in terms of those causes. For example, insurers and manufacturers and their spokespersons can often be heard to say such things as that, while asbestos (smoking, etc.) causes cancer – raising its probability – this individual case of cancer could never be safely attributed to the proposed cause. This stance is contradicted by both the theory and practice of causal attribution.
What is Causation?
The traditional philosophy of causation, going back arguably to Aristotle and certainly to David Hume, was a deterministic theory that attempted to find necessary and sufficient conditions for one event to be a cause of another. That analytic approach to philosophy was itself exemplified in Plato’s Socratic dialogues, which, ironically, were mostly dialogues showing the futility of trying to capture concepts in a tight set of necessary and sufficient conditions. Nevertheless, determinism dominated both philosophy and society at large for many centuries. It took until the rise of probabilistic theories within science, and especially that of quantum theory, before a deterministic understanding of causality began to lose its grip, first to the wholly philosophical movement of “probabilistic causality” and subsequently the development of probabilistic artificial intelligence – Bayesian network technology – which subsumed probabilistic causal theories and applied computational modeling approaches to the philosophical theory of causality. Formal probabilistic theories of causal attribution have flowed out of this research. The defences of inaction or a refusal to pay out insurance reliant upon deterministic causality are at least a century out of date.
Instead I will describe an accepted theory of causal attribution in climate science, which provides a clear criterion for ascribing extreme weather events to AGW.
The most widely used attribution method for extreme weather is the Fraction of Attributable Risk (FAR) for ascribing a portion of the responsibility of an event to AGW (Stott et al., 2004). It has a clear interpretation and justification, and it has the advantage of presenting attribution as a percentage of responsibility, similar to percentages of explained variation in statistics (as Sewall Wright, 1934, pioneered). That is, it can apportion, e.g., 80% of the responsibility of a flooding event to AGW and 20% to natural variation (¬AGW) in some particular case, which makes intuitive sense. So, I will primarily discuss FAR in reference to attributing specific events to AGW. It should be borne in mind, however, that there are alternative attribution methods with good claims to validity (including my own, currently in development, based upon Korb et al., 2011), as well as some criticism of FAR in the scientific literature. The methodological science of causal attribution is not as settled as the science of global warming more generally, but is clear enough to support the claims of climate scientists that extreme weather is increasing due to climate change and in many individual cases can be directly attributed to that climate change.
FAR compares the probability of an extreme event E under AGW – i.e., P(E|AGW) – and under a “null hypothesis” of no global warming (the negation of AGW, i.e., ¬AGW), by taking their ratio in:
FAR = 1 – P(E|¬AGW)/P(E|AGW)
As is common in statistics, E is taken as the set of events of a certain extremity or greater. For example, if there is a day in some region, say Sydney, Australia, with a high temperature of 48.9, then E would be the set of days with highs ≥ 48.9.
Assuming there are no “acts of god”, any event can be 100% attributed to prior causes; that is, the maximum proportion of risk that could possibly be explained is 1. FAR splits that attribution into two parts, that reflecting AGW and that reflecting everything else, i.e., natural variation in a pre-industrial climate (e.g., Schaller et al., 2016); it does so by subtracting from the maximum 1 that proportion that can fairly be allocated to the null hypothesis. To take a simple example (see Figure 2), suppose we are talking about an event with a 1% chance, assuming no AGW; i.e., P(E|¬AGW) = 0.01. Suppose that in fact AGW has raised the chances ten-fold; that is, P(E|AGW) = 0.1. Then the proportion FAR attributes to the null hypothesis is 0.01/0.1 = 0.1, and the fraction FAR attributes to AGW is the remainder, namely 0.9. Since AGW has raised the probability of events of this particular extremity – of E’s kind – 10 fold, it indeed seems fair to attribute 10% of the causation to natural variation and 90% to unnatural variation.
In order to compute FAR, we first need these probabilities of the extreme event. It’s natural to wonder where they come from, since we are talking about extreme events, and thus unlikely events that we wouldn’t have had the time and opportunity to measure. (To be sure, if good statistics have been collected historically, they may be used, especially for estimating P(E|¬AGW); some studies cited below have done that.) In fact, however, these likelihoods are derivable from the theories themselves, or simulations that represent such theories. GCMs are used to model anthropogenic global warming scenarios with different assumptions about the extent to which human economic behavior changes in the future, or fails to change. If we are interested in current extreme events, we can use such a model without any of the future scenarios: sampling the GCM model for the present will tell us how likely events of type E will be under current circumstances, with AGW. But we can also use the model to estimate P(E|¬AGW) by running it without the past human history of climate forcing, to see how likely E would be without humanity’s contributions. Since the GCMs are well validated, this is a perfectly good way to obtain the necessary likelihoods. (However, some caveats are raised below.)
Since individual weather events occur in specific locations, or at least specific regions, in order to best estimate the probabilities of such events, GCMs are typically used in combination with regional weather models, which can achieve greater resolutions than GCMs alone. (GCMs can also be modified to have finer resolutions over a particular region.) Regional models have been improving more rapidly than GCMs in recent years, which is one reason that FAR attributions are becoming both more accurate and more common (e.g., Black et al., 2016).
Attribution of Individual Weather Events
Thus, there is a growing body of work attributing specific extreme weather events to anthropogenic global warming using FAR, which represents the “fraction” of responsibility that an event of the given extremity, or greater, can be attributed to anthropogenic global warming versus natural variation in a pre-industrial climate. Much of this work is being coordinated and publicized by the World Weather Attribution organization, which is a consortium of research organizations around the world.
I note some recent examples of FAR attributions (with confidence intervals for the estimates when reported up front). I do not intend to explain these specific attributions here; you can follow the links, which lead to summary reports explaining them. Those summaries cite the formal academic publications, which detail the methods and simulations used and the relevant statistics concerning the results.
Flooding from tropical storm Imelda in September, 2019: FAR of 0.505 (± 0.12) (World Weather Attribution, 2019). [Note: This was not reported as FAR, but in likelihoods; conversion to FAR is straightforward. Links are to specific reports, which themselves link to academic publications.]
Heatwave in Germany and the UK, July, 2019: FAR between 0.67 and 0.9. The FAR for other parts of Europe were higher (but not specified in their summary) (World Weather Attribution, 2019).
Drought in the Western Cape of South Africa from 2015-2017, leading to a potential “Day Zero” for Cape Town, when the water would run out (averted by rainfall in June, 2018). This extreme drought had an estimated FAR of about 0.67 (World Weather Attribution, 2019).
Extreme rainfall events in New Zealand from 2007-2017: FARs ranging from 0.10 to 0.40 (± 0.20 in each case). These fractions accounted for NZ$140.5M in insured costs, which was computed by multiplying the FARs with actual recorded costs (Noy, 2019). [NB: uninsured and non-dollar costs are ignored.] The application of FARs to compute responsibility for insurance costs by economists is a new initiative.
The 2016 marine heatwave that caused severe bleaching of the Great Barrier Reef was estimated to have a FAR of about 0.95 for maximum temperature and about 0.99 for duration of the heatwave by Oliver et al. (2018). Their report is part of an (approximately) annual report in the Bulletin of the American Meteorological Society that reports on a prior year’s extreme weather events attributable to human factors, the latest of which is Herring et al. (2018), a collection of thirty reports on events of 2016.
A recent review – re-examining FAR calculations via new simulations – of three dozen studies of droughts, heat waves, cold waves and precipitation events found numerous substantial FARs, ranging up to 0.99 in many cases, as well as a few with inverted FARs, indicating some events made less likely by anthropogenic global warming (Angélil et al., 2017).
The recent fires in Australia are being given a FAR analysis as I write this (see https://www.worldweatherattribution.org/bushfires-in-australia-2019-2020/). There is widespread agreement that the intensity of wildfires is increasing, and that the fire seasons in which they take place are lengthening. Fire simulation models capable of incorporating the observed consquences of climate change (droughts, heatwaves, etc.) are in use and can be applied to this kind of estimation, although that is not yet being done. The forthcoming analysis is limited to the precursors of the fires, drought and heat, but also including the Forest fire Weather Index (from a personal communication).
Despite the apparent precision of some of these FAR estimates, they all come with confidence intervals, i.e., ranges within which we would expect to find the true value. They are not all recorded above, but those who wish to find them can go to the original sources.
Another kind of uncertainty applies to these estimates, concerning the variations in the distributions used to estimate FARs such as those of Figure 2. Some suggest that AGW itself brings a greater variation in the weather, fattening the tails of any probability distribution over weather events, and so making extremes on both sides more likely. So, for example Figure 2 might more properly show a flatter (fatter) distribution associated withAGW, in addition to being shifted to the right of the distribution for ¬AGW. This, however, would not affect the appropriateness of a FAR estimation: whether the likelihood ratio for E is determined by a shift in mean, a change in the tails, or both, that ratio nevertheless correctly reports the probabilities of the observed weather event relative to each alternative.
A potentially more pointed criticism is that GCMs may be more variable than the real weather (e.g., Osborn, 2004). Higher variability implies reaching extremes more often (on both ends of the scale). This is exacerbated if using multiple GCMs in an ensemble prediction. Such increased variance may apply more to simulations of AGW than to ¬AGW, although that’s unclear. In any case, this is a fair criticism and suggests somewhat greater uncertainty in FAR attributions than may have been reported. It would be best addressed by improved validation of GCMs, whether individually or in ensemble. The science of weather attribution is relatively new and not entirely settled; nevertheless, the methods and results in qualitative terms are well tested and clear. Many individual extreme weather events can be attributed largely to human-induced climate change.
The Future of Extreme Weather
The future of extreme weather appears to be spectacular. Given the overwhelming scientific evidence for the existence and continued development of anthropogenic global warming, and the clear evidence of tepid commitment or positive opposition to action from political leaders around the world, climate change is not just baked in for the next few decades, but is likely to be accelerating during that time. The baking period will be the few hundred years thereafter. Extreme pessimism, however, should be discouraged. It really does matter just when, and how, national, regional and global activities to reduce or reverse greenhouse gas emissions are undertaken. Our choices could well determine whether we face only severe difficulties, or instead global chaos, or perhaps civilizational collapse, or even human extinction. It is certain that earth’s biosphere will recover to some equilibrium eventually; it’s not so certain whether that equilibrium will include us.
For the short term, at least, climate science will continue to make progress, including improved understanding of weather attribution. Our current understanding is already good enough to give strong support to the case for action, as put in a recent excellent review of the state of the art in weather attribution circa 2015 or so:
Event attribution studies … have shown clear evidence for human influence having increased the probability of many extremely warm seasonal temperatures and reduced the probability of extremely cold seasonal temperatures in many parts of the world. The evidence for human influence on the probability of extreme precipitation events, droughts, and storms is more mixed. (Stott et al., 2016)
As I’ve shown above, since that review, attribution research has been extended to show considerable human influence on many cases of extreme rainfall, droughts and storms. While uncertainties remain, as regional and dynamic circulation models continue to improve, it seems certain that extreme weather attributions to anthropogenic causes will become both more pervasive and more definite in the near future. These improvements will enable us to better target our efforts at adaptation, as well as better understand the moral and legal responsibility for the damage done by unabated emissions.
Despite well-funded and entrenched opposition, we must push ahead with parallel projects to reduce, reverse and adapt to the drivers of climate change, in order to minimize the damage to our heirs, as well as to our future selves.
I would like to acknowledge the helpful comments of Steven Mascaro, Erik P Nyberg, Bruce Marcot, Lloyd Allison and anonymous reviewers to earlier versions of this article.
Allen, M. (2003). Liability for climate change. Nature, 421(6926), 891.
Angélil, O., Stone, D., Wehner, M., Paciorek, C. J., Krishnan, H. and Collins, W. (2017). An independent assessment of anthropogenic attribution statements for recent extreme temperature and rainfall events. Journal of Climate, 30, 5–16, doi:10.1175/JCLI-D-16-0077.1.
Bindoff, N.L., Stott, P.A., AchutaRao, K.M.,, Allen, M.R., Gillett, N.G., Gutzler, D., Hansingo, K., Hegerl, G., Hu, Y., Jain, S., Mokhov, I.I., Overland, J., Perlwitz, J., Sebbari, R., & Zhang, X. (2013). Detection and attribution of climate change: from global to regional climate. Climate Change 2013 The Physical Science Basis: Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. T. Stocker, D Qin, Plattner G-K et al. Cambridge, UK, Cambridge University Press: 867-952.
Black, M. T., Karoly, D. J., Rosier, S. M., Dean, S. M., King, A. D., Massey, N. R., … & Otto, F. E. (2016). The weather@home regional climate modelling project for Australia and New Zealand. Geoscientific Model Development, 9(9).
Grimes, D. R. (2016). On the viability of conspiratorial beliefs. PloS one, 11(1), e0147905.
Handfield, T., Twardy, C. R., Korb, K. B., & Oppy, G. (2008). The metaphysics of causal models. Erkenntnis, 68(2), 149-168.
Herring, S. C., Christidis, N., Hoell, A., Kossin, J. P., Schreck III, C. J., & Stott, P. A. (2018). Explaining extreme events of 2016 from a climate perspective. Bulletin of the American Meteorological Society, 99(1), S1-S157.
Houghton, J. (2009). Global warming: the complete briefing. Cambridge University Press.
IPCC (2014). AR5 Climate Change 2014: Impacts, Adaptation, and Vulnerability.
Korb, K. B., Nyberg, E. P., & Hope, L. (2011). A new causal power theory. Illari, Russo and Williamson (Eds) Causality in the Sciences, Oxford University Press, pp. 628-652.
McAneney, J., Sandercock, B., Crompton, R., Mortlock, T., Musulin, R., Pielke Jr, R., & Gissing, A. (2019). Normalised insurance losses from Australian natural disasters: 1966–2017. Environmental Hazards, 1-20.
Noy, I. (2019). The economic costs of extreme weather events caused by climate change. Australasian Bayesian Network Modelling Society Conference, Wellington, New Zealand, 13-14 November, 2019.
Oliver, E. C., Perkins-Kirkpatrick, S. E., Holbrook, N. J., & Bindoff, N. L. (2018). Anthropogenic and natural influences on record 2016 marine heat waves. Bulletin of the American Meteorological Society, 99(1), S44-S48.
O’Keeffe, L. M., Taylor, G., Huxley, R. R., Mitchell, P., Woodward, M., & Peters, S. A. (2018). Smoking as a risk factor for lung cancer in women and men: a systematic review and meta-analysis. BMJ open, 8(10), https://bmjopen.bmj.com/content/8/10/e021611.
Osborn, T. J. (2004). Simulating the winter North Atlantic Oscillation: the roles of internal variability and greenhouse gas forcing. Climate Dynamics, 22(6-7), 605-623.
Pearl, J. (2000). Causality: Models, Reasoning and Inference. Cambridge: MIT Press.
Schaller, N., Kay, A. L., Lamb, R., Massey, N. R., Van Oldenborgh, G. J., Otto, F. E., Sparrow, S. N., Vautard, R., Yiou, P., Ashpole, I., Bowery, A., Crooks, S. M., Haustein, K., Huntingford, C., Ingram, W. J., Jones, R. G., Legg, T., Miller, J., Skeggs, J., Wallom, D., Weisheimer, A., Wilson, S., Stott, P. A., Allen, M. R. (2016). Human influence on climate in the 2014 southern England winter floods and their impacts. Nature Climate Change, 6(6), 627.
Stott, P. A., Christidis, N., Otto, F. E., Sun, Y., Vanderlinden, J. P., Van Oldenborgh, G. J., … & Zwiers, F. W. (2016). Attribution of extreme weather and climate‐related events. Wiley Interdisciplinary Reviews: Climate Change, 7(1), 23-41.
Stott, P. A., Stone, D. A., & Allen, M. R. (2004). Human contribution to the European heatwave of 2003. Nature, 432(7017), 610.
US Global Change Research Program (2017). Climate Science Special Report: Fourth National Climate Assessment, Volume I [Wuebbles, D.J., D.W. Fahey, K.A. Hibbard, D.J. Dokken, B.C. Stewart, and T.K. Maycock (eds.)] doi: 10.7930/J0J964J6.
Wallace, C. S. (2005). Statistical and Inductive Inference by Minimum Message Length. Springer Verlag.
Woodward, J. (2005). Making Things Happen: A Theory of Causal Explanation. Oxford University Press.
Analysing arguments is a hard business. Throughout much of the 20th century many philosophers thought that formal logic was a key tool for understanding ordinary language arguments. They spent an enormous amount of time and energy teaching formal logic to students before a slow accumulation of evidence showed that they were wrong and, in particular, that students were little or no better at dealing with arguments after training in formal logic than before (e.g., Nisbett, et al., 1987). Beginning around 1960 a low-level rebellion began, leading to inter-related efforts in understanding and teaching critical thinking and informal logic (e.g., Toulmin, 1958).
Argument mapping has long been a part of this alternative program; indeed it predates it. The idea behind argument mapping is that while formal logic fails to capture much about ordinary argument that can help people’s understanding, another kind of syntax might: graphs. If the nodes of a graph represent the key propositions in an argument and arrows represent the main lines of support or critique, then we might take advantage of one of the really great tools of human reasoning, namely, our visual system. Perhaps the first systematic use of argument maps was due to Wigmore (1913). He presented legal arguments as trees, with premises leading to intermediate conclusions, and these to a final conclusion. This simple concept of a tree diagram representing an argument or subargument – possibly enhanced with elements for indicating confirmatory and disconfirmatory arguments and also whether lines of reasoning function as alternatives or conjunctively – has been shown to be remarkably effective in helping students to improve their argumentative skills (Alvarez, 2007).
However effective and useful argument maps have been shown to be, there is one central aspect of most arguments that they entirely ignore: degrees of support. In deductive logic there is no room for degrees of support: arguments are either valid or invalid; premises are simply true or false. While that suffices for an understanding of Aristotle’s syllogisms, it doesn’t provide an insightful account, say, of arguments about global warming and what we should do about it. Diagnoses of the environment, human diseases or the final trajectory of our universe are all uncertain, and arguments about them may be better or worse, raising or lowering support, but very few are simply definitive. An account of human argument which does not accommodate the idea that some of these arguments are better than the others and that all of them are better than the arguments of flat-earthers is one that is simply a failure. Argument mapping can not be the whole story.
Our counterproposal begins with causal Bayesian networks (CBNs). These are a proper subset of Bayesian networks, which have proved remarkably useful for decision support, reasoning under uncertainty and data mining (Pearl, 1988; Korb & Nicholson, 2010). CBNs apply a causal semantics to Bayesian networks: whereas BNs interpret an arc as representing a direct probabilistic dependency between variables, CBNs interpret an arc as representing both a direct probabilistic and a direct causal dependency, given the available variables (Handfield, et al., 2008). When arguments concern the state of a causal system, past, present or future, the right approach to argumentation is to bring to bear the best evidence about that state to produce the best posterior probability for it. When a CBN incorporates the major pieces of evidence and their causal relation to the hypothesis in question, that may already be sufficient argument for a technologist used to working with Bayesian networks. For the rest of us, however, there is still a large gap between a persuasive CBN and a persuasive argument. So, our argumentation theory ultimately will need to incorporate also a methodology for translating CBNs into a natural language argument directed at a target audience.
Consider the following simple argument:
We believe that Smith murdered his wife. A large proportion of murdered wives turn out to have been murdered by their husbands. Indeed, Smith’s wife had previously reported to police that he had assaulted her, and many murderers of their wives have such a police record. Furthermore, Smith would have fled the scene in his own blue car, and a witness has testified that the car the murderer escaped in was blue.
Unlike many informal arguments, this one is already simple and clear: the conclusion is stated upfront, the arguments are clearly differentiated, and there is no irrelevant verbiage. Like most informal arguments, however, it is a probabilistic enthymeme: it supports the conclusion probabilistically rather than deductively and relies on unstated premises. So, it’s hard to give a precise evaluation of it until we make both probabilities and premises more explicit, and combine them appropriately.
We can use this simple CBN to assess the argument:
Wife reported assault → Smith murdered wife → Car blue → Witness says car blue
The arrows indicate a direct causal influence of one variable on the probability distribution of the next variable. In this case, these are simple Boolean variables, and if one variable is true then this raises the probability that the next is true, e.g., if Smith did assault his wife, then this caused him to be more likely to murder his wife. (It could be that spousal assault and murder are actually correlated by common causes, but this wouldn’t alter the probabilistic relevance of assault to murder, so we can ignore the possibility here.)
First, we can do some research on crime statistics to find that 38% of murdered women were murdered by their intimate partners, and so get our probability prior to any other evidence.†
Second, we can establish that 30% of women murdered by their intimate partners had previously reported to police being assaulted by those partners (based upon Olding and Benny-Morrison, 2015). Admittedly, as O. J. Simpson’s lawyer argued, the vast majority of husbands who assault their wives do not go on to murder them. However, his lawyer was wrong to claim that Simpson’s assault record was therefore irrelevant! We just need to add some additional probabilities, which a CBN forces us to find, and combine them appropriately, which a CBN does for us automatically. Suppose that in the general population only 3% of women have made such reports to police, and this factor doesn’t alter their chance of being murdered by someone else (based on Klein, 2009). Then it turns out that the assault information raises the probability of Smith being the murderer from 38% to 86%.
Third, suppose we accept that if Smith did murder his wife, then the probability of him using his own blue car is 75–95%. Since this is imprecise, we can set it at 85% (say) and vary it later to see how much that affects the probability of the conclusion (in a form of sensitivity analysis).
Fourth, we can test our witness to see how accurate they are in identifying the color of the car in similar circumstances. When a blue car drives past, they successfully identify it as blue 80% of the time. Should we conclude that the probability that the car was blue is 80%? This would be an infamous example, due to Tversky and Kahneman, of the Base Rate Fallacy — i.e., ignoring prior probabilities. In fact, we also need to know how successfully the witness can identify non-blue cars as non-blue (say, 90%) and the base rate of blue cars in the population (say, 15%). Then it turns out that the witness testimony alone would raise the probability that Smith was the murderer from 38% to 69%. Combining the witness testimony with the assault information, then the updated probability that Smith is the murderer rises to 96%.
Even this toy example illustrates that building a CBN forces one to think about how the main factors are causally related and to investigate all the necessary probabilities. Assuming the CBN is correct for the variables considered, and is built in one of many good BN software tools, it acts as a useful calculator: it combines these probabilities appropriately to calculate the probability of our conclusion. Thus, it helps prevent much of the vagueness and fallacious reasoning that are widespread, even in important legal arguments.
Alternative Techniques for Argument Analysis
Although there are genuine difficulties in using this technique, we believe that much of the resistance to it is based on imaginary difficulties, while the (italicized) rival techniques below have difficulties of their own.
In our toy example, the prose version of the argument doesn’t quantify the probabilities involved, doesn’t specify the missing premises, doesn’t indicate how the various factors are related to each other, and it’s far from clear how to compute an appropriate probability for the conclusion. The fact that the probabilities and premises aren’t specified doesn’t really make the argument non-probabilistic, it just makes it vague. Prose is often the final form of presenting an argument, but it is far from ideal for the prior analysis of an argument.
Resorting to techniques from formal logic, diagrammatic or otherwise, requires even more effort than CBN analysis, while typically losing information. It is really appropriate only for the most rigorous possible examination of essentially deductive arguments.
A more recent approach with some promising empirical backing is the use of argument maps. These are typically un-parameterized non-causal tree structures in which the conclusion is the trunk and all branches represent lines of argument leading to it. (See Tim van Gelder’s ‘Critical Thinking on the Web’.) Arguably, these are equivalent to a restricted class of Bayesian network without explicit parameters (as in the qualitative probabilistic networks of Wellman, 1990). Thus, they have many of the advantages of BNs, but they don’t provide much guidance in computing probabilities, so they can be vague and subject to the kinds of fallacious reasoning that are avoided with actual BNs. Also, as they are typically not causal, they can actually encourage misunderstanding of the scenario.
There are many common objections to the use of Bayesian networks, or causal Bayesian networks, for argumentation. Here we address some of these.
1) Bayesian network tools are difficult to use.
This is true for those who are not experienced with them. “Fluency” with BN tools requires training something on the order of the amount of training required to become a reasonably good argument analyst using any tool. (In our experience, some philosophers get fed up with Bayesian network tools when they fail to represent an argument effectively within the first ten minutes of use!)
There are other options besides training. For specific applications, easy-to-use GUIs have been developed. Also, Bayesian network tools can be (and should be) enhanced to support features that would make them easier for argument analysis, such as allowing nodes to be displayed with the full wording of a proposition which they represent. But that’s up to tool developers. In the meantime, serious argument analysts would profit from learning how to use the tools, not just for the sake of argumentation, but also for the wide range of other tasks they have been developed for, such as decision analysis.
2) BNs force you to put in precise numbers for priors and likelihoods; this is a kind of false precision. Argument maps are better because they are qualitative.
Certainly, numbers need to be entered to use the automated updating via Bayes’ theorem. As quantities, they are precise (at least to whatever limited-precision arithmetic the tool supports). That doesn’t mean that the precision need be false, meaning falsely interpreted. The user can be fully aware of their limits. Indeed, all BN tools support sensitivity analysis, the ability to test the BN’s behavior across a range of values. So, if the analyst is unsure of just what the probability of something is, she or he can try out a range of numbers to see what effect the variation has on other variables of interest. If the conclusion can be substantially weakened by pushing the probability of premises around within reasonable limits, then it’s correct to infer that the argument is not compelling, and, otherwise, the argument may be compelling. This kind of investigation of the merits of the argument — and uncertainty of our beliefs — is not possible with qualitative maps alone.
Forcing one to obtain numbers is actually an advantage, as the example above indicated: the analyst is forced to learn enough about the domain to model it effectively.
3) Where do the numbers come from?
This is an objection any Bayesian will have encountered repeatedly. Since we are here talking about causal Bayesian networks, the ultimate basis for these probabilities must be physical dispositions of causal systems. Practically speaking, they will be sourced using the same means that Bayesian network modellers use in all the applied sciences, a combination of sample data (using data mining tools) and expert opinion (see Korb and Nicholson, 2010, Part III for an introduction to such techniques).
4) Naive Bayesian networks (NBNs) have been used effectively for argument analysis and are much simpler, e.g., by Peter Sturrock (2013) in his “AKA Shakespeare”. Why not just use them?
NBNs for argumentation simplify by requiring that pieces of evidence be independent of each other given one or another of the hypotheses at issue. If the problem really has that structure, then there’s nothing wrong with expressing it in an NBN. However, distorting arguments into that structure when they don’t fit causes problems, rather than resolving them. In Sturrock’s case, he suggested, for example, that the Stratford Shakespeare not having left behind a corpus of unpublished writing, not having written for aristocrats for pay, and not having engaged in extensive correspondence with contemporaries are all independent items of evidence, meaning that their joint likelihood is obtained by multiplying their likelihoods together (and then multiplied again with the likelihoods of all other items of evidence he advanced). The result was that he found that the probability that the writings of Shakespeare came from the eponymous guy from Stratford ranged from 10-15 all the way down to 10-21! As Neil Thomason pointed out to us, this means that you would be more likely to encounter the author of those works by randomly plucking any human off the planet at the time (or since!), rather than arranging to meet that Will Shakespeare from Stratford! While the simplicity of NBNs is appealing, this is a case of making our models simpler than possible. Real dependencies and interrelatedness of evidence cannot be ignored.
5) Some arguments are not about causal processes, but have a structure that can only be illuminated otherwise.
Here’s a famous case:
Socrates was a human.
All humans are mortal.
Therefore, Socrates was mortal.
While Bayesian networks can certainly represent deductive arguments, they will not be causal. Furthermore, their probabilistic updating will be uninformative. A reasonable conclusion is that BNs are ill suited for analysing deductive arguments. Argument maps may or may not be helpful; at least, their lack of quantitative representation will do no harm in such cases.
This concession is not exactly painful: our advocacy of CBNs was always only about cases where causal reasoning does figure in the assessment of a thesis. Slightly more problematic are cases where the core reasoning might be claimed to be associative rather than causal. For example, yellow stained fingers are associated with lung cancer, but staining your fingers yellow is not a leading cause of lung cancer. That implies we can make meaningful arguments from one outcome to the other without following a causal chain. (The inference of a causal chain from such associations is frequently derided as the “post hoc propter hoc” fallacy.)
In such cases, however, we are still reasoning causally, and it is best to have that causal reasoning made explicit:
Yellowfingers← Smoking → Lung Cancer
With the correct causal model, we can follow the dependencies, and we can also figure out the conditional independencies in the situation (screening off relations). Without the causal model available, we will only be using our intuitions to assess dependencies, and we will often get things wrong.
6) There are generally very many equally valid ways of modeling a causal system. How can one choose between them?
This is certainly correct. For example, between smoking and lung cancer there are a great many low-level causal processes required to damage lung cells and produce a malignant cancer. Whether we choose to model them or not depends upon our interests (pragmatics). If we are not arguing about the low-level processes, then we shall probably not bother to model them, as they would simply be a distraction. In general, there will always be multiple correct ways of modeling a causal system, meaning that the probabilistic (and causal) dependencies between the variables used are correctly represented. Which one you use will depend in part upon your argumentative purpose and in part upon your taste.
If we are to know that our argument methods are good, we shall need methods of assessing them, built upon justifiable methods for assessing individual arguments. Arguments may be evaluated either as probabilistic predictions (if they are quantitative) or as natural language arguments or both. Here we will address quantitative evaluation. Evaluation of arguments in terms of their intelligibility, etc. we will leave to a future discussion.
One of the leading experts on probabilistic prediction in the social sciences, Philip Tetlock, has said “it really isn’t possible to measure the accuracy of probability judgment of an individual event” (Tetlock, 2015). This is not correct. To be sure, in context Tetlock points out that it is possible to measure the accuracy of probability judgments within a reference class, by accumulating the scores of individual predictions and using their average as a measure of judgment in like circumstances. Of course, if that is true, then such a measure applies equally to individual judgments within the reference class (one cannot accumulate the scores of individual predictions if there are no such scores!), so Tetlock’s point turns into the banal observation that you can “always” defend a failed probabilistic prediction. For example, if an event fails to occur that you have predicted with probability 99.9999%, you can shrug your shoulders and say “shit happens!” But actually that’s a defence that you cannot use too very often.
Tetlock suggests that the whole problem of assessing probabilistic predictions is a deep mystery. But his real problem is just the score he uses to assess predictions, namely the Brier score. It is a seriously defective measure of probabilistic predictions, and that ought to be surprising, since the real work in solving how to assess predictions was done half a century ago. But communications between the various sciences is slow and painful.
In most of statistical science an even worse measure of predictive adequacy is used: predictive accuracy. Predictive accuracy is defined as the number of correct predictions divided by the number of predictions. How can you do better in measuring predictive accuracy than using predictive accuracy? Of course, that’s why we slipped in the phrase “predictive adequacy” in place of “predictive accuracy”.
The problem with predictive accuracy is that it ignores the fact that prediction is inherently uncertain and so probabilistic. We should like our predicted probabilities to match the actual frequencies of outcomes that arise in similar circumstances. If, for example, we were using a true (stochastic) model to make our predictions, such a match would be guaranteed by the Law of Large Numbers. Predictive accuracy takes a probabilistic prediction’s modal value and effectively rounds it up to 1. For example, in measuring predictive accuracy, a probabilistic prediction that a mushroom is poisonous of 0.51 counts the same as one of 1. But that they should not be assessed as the same is obvious! The problem is what cognitive psychologists call “calibration”: if your probabilistic estimates match real frequencies on average, then you are well calibrated. Most of us are overconfident, pushing probabilities near 1 or 0 even nearer to 1 or 0. Nate Silver, for example, reports that events turning up 15% of the time are routinely said to be “impossible” (Silver, 2012). Another way of pointing this out is that predictive accuracy is not a strictly proper scoring rule, that is, it will reward the true probability distribution for events maximally, but it will also reward many incorrect distributions equally. For example, if you take every modal value and revise its probability to be maximal, you will have an incorrect distribution that is rewarded identically to the correct distribution.
Tetlock’s Brier score is strictly proper, but that doesn’t make it strictly correct. Propriety is a kind of minimum standard: if you can beat (or match) the truth with a false distribution, then the scoring function isn’t telling us what we want. Brier’s score reports the average squared deviation of the actual outcomes from the predicted outcome, so the goal is to minimize it (it is a form of root mean squared error). If we have the true distribution in hand, we cannot be beaten (any deviation from the actual probability will be punished over the long run). However, Brier’s score, while punishing deviant distributions, does so insufficiently in many cases. Consider the extreme case of predicting a mushroom’s edibility with probability 1. This will be punished when false with a penalty of 1. While such a penalty is maximal for a single prediction, in a long run of predictions, it may be washed out by other, better predictions. From a Bayesian point of view, this is highly irrational: a predicted probability of 1 corresponds to strictly infinite odds against any alternative occurring! That kind of bet is always irrational, and if it goes wrong, it should be punished by losing everything in the universe; that is, recovery should be impossible. The Brier score punishes mistakes in the range [0.9, 1] much the same, even though the shift from a prediction of 0.9 to 0.91 is qualitatively massively distinct from a shift from 0.99 to 1: a “step” from finite to infinite odds! Extreme probabilities need to be treated as extreme for a scoring function to correctly reward calibration and penalize miscalibration.
As we said, this problem has been solved some time ago, beginning with the work of Claude Shannon (Shannon and Weaver, 1949). Shannon proposed measuring information in a “message” by using an efficient code book to encode it and reporting the length of the encoding. An efficient code is one which allocates –log2 P(message) bits to all possible messages.
It turns out that log scores based upon Shannon’s information measure have all the properties we should like for scoring predictions. I.J. Good (1952) proposed as a score the number of bits required to encode the actual outcome given a Shannon efficient code based on the predicted outcome. That is, Good’s reward for binary predictions is:
This is the negation of the number of bits to report the actual outcome using the code efficient for the predictive distribution plus 1. The addition of 1 just renormalizes the score, so that 0 reports complete ignorance, positive numbers predictive ability above chance and negative numbers worse than chance, relative to a prior probability of 0.5 for a binomial event. Hope and Korb (2004) generalized Good’s score to multinomial predictions.
Nothing will be able to beat the true distribution in encoding actual outcomes with an efficient code over the long run; indeed, nothing will match it, so the score is strictly proper. But the penalty for mistakes is straightforwardly related to the odds one would take to bet against the winning proposition. Infinite odds imply an outcome that is impossible, meaning in information-theoretic terms, an infinite message describing the outcome. No matter how long a sequence of predictions is scored, an infinite penalty added to a finite number of successes will remain an infinite penalty. So, irrationality is appropriately punished.
All of this refers to the usual circumstance of scoring or assessing predictions, where we know the outcome, but we are uncertain of the processes which bring it about. Supposing that we actually know how the outcomes are produced is supposing that we have an omniscient, God-like perspective on reality. But, in fact, in special cases we do have a God-like perspective, namely when the events we are predicting are the outcomes of a computer simulation that we know, because we built it. In such cases, we can score our models more directly than by looking at their predictions and comparing them to outcomes. We can simply compare a model, produced, say, by some argumentative method, with the simulation directly. In that case, another information-theoretic construct recommends itself: cross entropy (or, Kullback-Leibler divergence). Cross entropy reports the expected number of bits required to efficiently encode an outcome from the true model (simulation, above) using the learned model instead of the true model. In other words, since we have both models (true and learned) we can compare their probability distributions directly, in information-theoretic terms, rather than taking a lengthy detour through their outcomes and predicted outcomes.
In Search of a Method
CBNs are an advantageous medium for addressing other common issues in argument analysis. Active open-mindedness suggests we can minimize confirmation bias by proactively searching out alternative points of view and arguments. This can be supported by constructing CBNs with sources of evidence and lines of causal influence additional to those which might at first satisfy us, and, in particular, which might be expected to cut against our first conclusion. In view of confirmation bias (and anchoring, etc.), it might be useful to give the task of constructing an alternative CBN to a second party.
Another benefit in using CBNs is the direct computational support for assessing the confirmatory power of different pieces of evidence relative to one another, how “diagnostic” evidence is in picking out one hypothesis amongst many. While Bayes’ factors— the relative likelihood of one hypothesis to another for the evidence — have long been recommended for assessing confirmation, once coded into a CBN the diagnostic merits of evidence for the hypotheses in play is trivially computable, and computed, by the CBN itself. Hence, the merits of each line of argument can be clearly and quickly assessed, whether in isolation or in any combination.
All of the above does not provide a complete theory of argumentation using CBNs. These uses of causal Bayesian networks must sit within a larger method. This must include deciding when CBNs are appropriate and effective, and when not. When they are not effective, alternative techniques will need to be applied, such as deductive logic or argument mapping. A rich theory of argumentative context and audience analysis is needed in order to understand such issues as which lines of argument can be left implicit (enthymematic) and which sources of premises are acceptable. And guidance needs to be developed in how to translate a CBN, which only represents arguments implicitly, into an explicit formulation in ordinary language.
The required techniques in which CBN-based argumentation is embedded are largely just those employed in critical thinking and argument analysis generally. It is a substantial, but achievable, research program, ranging across disciplines, to develop these to the point where trained analysts might produce similar, and similarly effective, arguments from the same starting points.
† The figure of 38% is a worldwide statistic from the WHO (“Domestic Violence”, Wikipedia). If the argument were specific to a country or region, other statistics might be more appropriate. The figure we have used is a reasonable one for the argument as stated, that is, without a specific context. Uncertainty for specific numbers can be treated via sensitivity analysis, as we discuss below.
Alvarez, Claudia (2007). Does philosophy improve critical thinking skills? Masters Thesis, Department of Philosophy, University of Melbourne.
In a recent blog post, Tim Wilson, the Australian Human Rights Commissioner, has defended Tony Abbott’s new rules restricting public servants in their political speech. In particular, he argues that it is not a genuine limitation of their speech and that it is a reasonable rule to impose on their employment. Here I will illustrate the process of argument analysis by a treatment of his argument. A prior caveat, however: there are always multiple, distinct ways of analysing arguments; and they will often be equally defensible. The goal of argument analysis is not to find a single, definitive argument which conclusively establishes a correct conclusion. (The plea for “proof” is a pretty good indicator of an absence of integrity in an argument!) The goal is to improve your argumentation and your thinking. Finality is a goal best reserved for the grave.
Tim Wilson’s Arguments
For the sake of brevity I will paraphrase Wilson’s arguments here. While excluding what is irrelevant to these two arguments in particular, the paraphrase is pretty accurate, as is easily determined by reference to the original. Also, I number the assertions and put them in blockquotes, although they are not literal quotes.
Argument 1: The New Rule Does Not Limit Free Speech
(1) The Department of Prime Minister and Cabinet has released new social media protocols. (2) The protocols limit the capacity of public servants to make political statements that are harsh or extreme in their criticism. (3) Employment codes are not law, and (4) so cannot constitute a legal limit on free speech. (5) Defending the universal human right of free speech is about the legal limits of speech.
Argument 2: The New Rule Is a Reasonable Employment Rule
(1) Codes of conduct provide an important civilizing role in filling gaps left by the law. For example, (2) codes of conduct restrict homophobic behavior. (3) Employment codes are not limiting, (4) since an employee may at any time resign. (5) What is specifically precluded by the new code is harsh and extreme criticism in areas that are related to their work.
I will apply the AA process only to the first argument, in order to keep this illustration of method reasonably short and clear.
Step 1: Clarify Meanings
Tim Wilson begins his post by pointing out that we should know something of what we talk about prior to opening our mouths: “Before anyone screams ‘free speech’, they should actually know what they are talking about.” The implied criticism of his critics, that they don’t know what they are talking about, is nowhere substantiated by Mr Wilson. However, the challenge is worth accepting.
So, what is free speech? Literally taken, it might be a right to say whatever you have the urge to say. In practice, however, as Wilson and every other commentator has noted, there are accepted limits upon speech. So, whatever right to speech we may be referring to is, and always has been, a limited right.
Freedom of speech as a right certainly has been recognized from long ago, for example, in the English Bill of Rights of 1689 and before that in ancient Greece, as John Milton noted in his famous defence of free speech, in Areopagitica. Free speech is recognized as fundamental in the Universal Declaration of Human Rights. It is notable also that the very first amendment in the Bill of Rights in the United States explicitly protects freedom of speech and a free press. Every democracy depends upon a free debate over public policy and principles, so attacks upon free speech are indirectly attacks upon democracy as well.
Nevertheless, it is perfectly well and widely accepted that there are proper limits on free speech. Speech that is likely to be hazardous or harmful to others is generally prohibited. Defamation and libel are also generally prohibited. And contracts may prohibit certain kinds of speech, such as the disclosure of proprietary information, as Wilson specifically notes. So, there is a real question whether Wilson’s defence of Abbott’s new rules is legitimate or not. Any reflex dismissal of it is a wrong reflex.
I have no particular unclarities about Wilson’s language, although I will return to some of the semantics later. I will also note that Wilson makes no distinction between “legal limits” on speech and “limits” on speech. That is, his post equivocates between them, attempting to support the claim that there are no limits imposed on free speech by Abbott’s actions because they do not impose any such limits in law. That inference is specious nonsense, of course.
There is a relevant background to this issue. Tony Abbott and his government now have a track record of restricting freedom of speech and the flow of relevant public information in ways that at least suggest they fear public scrutiny of their actions. When the ABC reported on evidence of the mistreatment of refugees by the Royal Australian Navy, Abbott labeled them “Un-Australian”; many of his ministers also condemned the ABC, and they have suggested its funding and role should be curtailed. On any matters connected to dealing with refugees, Border Protection Minister Scott Morrison routinely invokes the cover of protecting military “operations” in refusing to address many questions, perhaps out of fear, for example, that smugglers might learn whether they have sent a boat to Australia. It seems likely that putting border protection and the handling of refugees under military control was, in part, designed to restrict public knowledge of the government’s activities. But, of course, issues of sovereignty and support for international law are pretty central to the public policy of a democracy. If anything is Un-Australian, it would have to be suppressing public debate about public policy.
Step 2: Identify Propositions
Step 3: Graph the Argument
Argument 1 might be graphed as:
This shows its radical incompleteness. (1) is just setting context, identifying what protocols are at issue. The conclusion here is implicit, so the graph is quite fragmentary; the conclusion is in the argument’s title, so just numbering that (6) and making obvious connections we get a much better representation of the argument:
A few observations on graphing are in order. This graph is just a quick Google hack, but there are more sophisticated tools for the purpose, such as Austhink’s Rationale. That tool will give you some syntactic sugar that you may find useful; for example, it colors supporting links green and contrary arguments red. Here I’m inventing two small pieces of syntax: a dotted line for context setting that’s not really part of the argument; arrows joining together to show that a conjunction of premises is required for support. To be sure, (2) is also required for the inference to (6), but it is less closely associated with (4) and (5). If you have a disjunctive argument, such as “X or Y → Z”, you might want to show that clearly as well, using color or dotted lines, etc.
Step 4: Make it Valid
We now tackle the argument one subargument at a time. (3) → (4) is presumably not controversial, but it is certainly not, strictly speaking, valid. Dr Neil Thomason likes to invoke his “Rabbit Rule”: you can’t pull a rabbit out of a hat, unless it was already in there. The premise (3) doesn’t even mention limits or free speech, so it cannot be valid to conclude anything about them, as (4) does. What we need is some innocuous hidden premise to get us there, such as, (A) only laws can constitute legal limits on free speech. Since (A) is innocuous, this hasn’t revealed anything revelatory; but it is all part of the AA process.
(2) (4) (5) → (6) is much the bigger problem. First, let’s just look at (4) (5) → (6) in isolation. We have a Rabbit problem here as well: the conclusion says the new rules don’t limit free speech, whereas the premises are about legal limits only. This is not my artifact: the equivocation lies in the original, as you can see for yourself. We shall have to fix it, by some kind of bridge, that will allow a valid inference. A plausible candidate would be: (B) that which does not constitute a legal limit on free speech does not constitute a limit on free speech. From this it validly follows that there is no limit on free speech, given the premise that the new APS rules do not constitute a legal restriction on speech. There is, however, an immediate problem with (B), which is that it is obviously false. When you appear to be compelled to introduce an obvious falsehood as a missing premise, that tends to be a bad sign. There is no help to be found in Wilson’s post, since he there recognizes no distinction between legal and other limits on speech, sliding over any problem. This is where (2) comes in, at least in my thinking. It (and related text, that I have not copied) appear to be suggesting that employment codes can be legally relevant, in particular by violating the law. The laws that might be both relevant and violated here are not gone into, but the qualification that it is only harsh and extreme criticism that is being suppressed suggests some such qualification. Therefore, I shall adopt (B’) as the missing premise: (B) so long as it only limits harsh or extreme critical speech. The subargment in question then becomes (with some modest rephrasing):
(2) The new rules limit employees’ political speech that is harsh or extreme in its criticism. (3) Employment codes cannot constitute a legal limit on free speech, if they only limit harsh or extreme criticism. (5) Free speech is about the legal limits of speech. (B’) That which is not a legal limit on free speech also does not limit free speech, so long as it at most limits harsh or extreme critical speech. (6) Therefore, the new rules do not limit free speech.
Our graph at this point is:
I accept this as valid, or near enough, but that’s hardly the end of the story.
Step 5: Counterargue
Tim Wilson’s suggestion that the right to free speech only concerns limits in law is one key issue. This certainly does reflect, for example, the first amendment to the US Constitution, which restricts what laws the US Congress may make. It also reflects the underlying motivation for many declarations about human rights in general and free speech in particular; the underlying motivation is to not tolerate governments which attack such freedoms. What it does not reflect, however, is the ability of governments to attack freedoms indirectly and implicitly. A government may, for example, attack free speech by financing those who openly support its policies and deny financing to those who openly criticize its policies. While this may not violate explicitly the Universal Declaration of Human Rights, taken to an extreme it can be just as effective and pernicious as government actions which do openly violate that Declaration. More directly, “limiting free speech” is ordinary English, not legalese: Tim Wilson has neither the right nor the ability to arrogate its meaning for his own purposes. Telling people they cannot say something is limiting free speech, whatever pathetic spin Wilson cares to put on it. The only legitimate issue is whether the limitation is warranted or not, and on that count also Wilson is very much on the wrong side.
Wilson has gone to some pains to present his view as quite moderate. The only limitation of speech is that by an employment contract, and that speech must be extreme or harsh before any cause to dismiss can be found. So reads Wilson’s blog. And no ordinary person would expect to use extreme or harsh criticism of their employers in public and get away with it. Hence, the objectors must just be more of the chattering classes, of the latte-sipping variety. But there are a few points Wilson neglected, best considered with a latte in hand.
First of all, there is pre-existing policy that current APS employees might have a reasonable expectation of being enforced. The APS employment policy states:
It is quite acceptable for APS employees to participate in political activities as part of normal community affairs. APS employees may become members of or hold office in any political party.
Clearly, it follows from this that criticism of the existing government by opposition members who are a part of the public service is legitimate and protected, whether distributed via social media or otherwise. Of course, that does not mean that “harsh” or “extreme” criticism must be protected. Or, then again, perhaps it does. Presumably, since public servants are encouraged to run for public office, they are not meant to be severely handicapped relative to the incumbents they run against. But under the new Abbott rules that is the case: Abbott and other incumbents can be as obnoxious, harsh or extreme as they like in attacking their opponents, but if their opponents are also public servants, they cannot return in kind. If I were a public servant campaigning against the likes of Abbott, I would first resign. But that is irrelevant: the fact remains that Abbott’s rules clearly violate the intent of the existing code of conduct by restricting otherwise free political speech. Unfortunately, matters are even worse than what I have just written.
The exact wording of the new rules is, in fact, relevant. Specifically, they restrict opinions posted in social media, whether acting professionally or not, which are “so harsh or extreme in their criticism of the Government, Government policies, a member of parliament from another political party, or their respective policies, that they could raise questions about the employee’s capacity to work professionally, efficiently or impartially” (my emphasis). This covers, for example, scientist public servants who may want to raise questions about George Brandis’ preposterous declamations on the climate change debate. Oh my! Were I a public servant, perhaps I would be fired tomorrow for that last sentence! It is certainly true than I hold my current political masters in contempt! Nevertheless, the standard being set here for public servants being called to account is simply absurdly low. Under what circumstances can the pack of Brandis, Abbott, Morrison, Hockey, Turnbull and the rest possibly raise questions about the professionalism of those who oppose them? I will leave it to your imagination. But if you are a public servant, you will have no difficulty answering the question and keeping your mouth firmly shut. Which is just what your masters want.
Steps 6 and 7: Consider Alternatives and Evaluate
I will illustrate these steps in the negative, by omission. As pure pedagogy it is not necessary, since it repeats the first five steps on new arguments; as a positive example, it may be necessary. I plead my case as a matter of time: I’ve taken a fair bit to do this much and need to get to other things. Perhaps, in future I shall return to this and complete it, however. Also, perhaps reader comments will help fill the gap.
I will, however, quickly comment on Wilson’s second argument. Codes of conduct may either be civilizing or barbarous. This new code might count as civilizing were the enormous leeway in its interpretation taken away. Wilson’s implicit suggestion that they are limited to work matters is at best misleading, however, since both political campaigns and scientific publications are explicitly mentioned as being circumscribed by the new rules. That the rules do not take away an employee’s right to quit work and face unemployment hardly means that employees’ rights to free speech are thereby unimpaired. A kidnap victim’s “right” to refuse an order and thereby get shot in the head doesn’t make such an event the victim’s fault, nor does its availability restore the victim’s freedom. Abbott’s rules demonstrate, as if further demonstration were needed, that all of his impulses are against transparency and freedom of speech. Barbarity is the New World Order.
To do a thorough analysis of an argument requires a certain discipline. The best approach I know of is due to Michael Scriven and specifically his book Reasoning (1976), which is out of print. Here I present my own version of this process, in compressed form, in seven steps. Roughly, the idea is to first build up the argument into its strongest possible form and then to try to tear the argument asunder. The result should be a good understanding of both the strengths and weaknesses of the argument as it was stated.
I present this as a process for analysing someone else’s argument as it might appear in some ordinary text. However, it may be applied elsewhere, for example, to your own arguments, with a view to improving them. Also, I present this as a kind of ideal. Since in reality we are all constrained by time, it’s unlikely that anyone will continually apply all of the steps to every argument of interest. It’s worthwhile applying all of them to some arguments, however.
The AA Seven Step Program
1. Clarify Meanings.
The first step to critiquing an argument is to understand it. Words that are new to you, or used in unusual ways, might require you to use a dictionary or an encyclopedia. This step may require a certain amount of detective work, for example, learning more about the author and the argument’s history, so as to understand the context and to disambiguate some of the expressions used. Lewis Carroll’s Humpty Dumpty was of course wrong to say that a word means “just what I choose it to mean”, but that doesn’t mean that what authors think they mean is irrelevant. This is also a step where there may be some opportunity to identify whether equivocation is playing some role in the argument; that is, to identify whether some words or phrases are being used in multiple senses, and perhaps misleadingly.
In this, Tim Wilson is exactly right. He writes this, however, in the context of a defence of his Prime Minister’s recent move to restrict the freedom of political speech by public servants. I shall post an analysis of this issue soon after this post, partly to debunk Wilson’s posturing and partly to illustrate the methods explained here.
2. Identify Propositions.
Propositions are assertions about the world, ruling some possible states of affairs in or out. In an argument, some one or more propositions will be premises — assumptions of the argument — while some one or more other propositions will be final conclusions. In between will be intermediate conclusions that are derived, directly or indirectly, from the premises and which are further used to derive other propositions. The rest of the argument will be, in effect, chaff — rhetorical flourishes, irrelevancies, noise. In this step all the relevant propositions should be identified and tentatively identified as premise, intermediate conclusion, final conclusion.
Propositions and sentences may or may not correspond. A sentence may easily contain many propositions. For example, the sentence “Boat people harbor terrorists and criminals and are not our kind of people” one might find three propositions. Propositions may also be spread over multiple sentences, where the sentences are complete grammatically, but somehow incomplete conceptually.
There are, of course, certain words and phrases which may introduce or indicate a role. Statements about observation, testimony, etc., would tend to suggest that premises are being discussed, while “thus” and “therefore” would tend to indicate conclusions. This kind of syntactical marker won’t carry us very far, however, since meanings can be stretched (I can “observe” a conclusion) and, importantly, intermediate conclusions are both premises and conclusions, giving rise to both kinds of marker. The real objective in tagging the propositions is to make the best sense of the argument as possible, so while graphing the argument (in the next step) you may well decide on a different way of classifying propositions as conclusions and premises.
3. Graph the Argument.
Graphing an argument, with each proposition appearing as a node and inferential steps as arrows relating premises and conclusions, is usually a useful exercise. It forces you to make the argumentative steps explicit. The most common way arguments go wrong is by leaving some, or much, of the reasoning implicit, where, unexamined, its imperfections remain unexposed. Per Louis Brandeis, “sunlight is said to be the best of disinfectants.”
While graphing you will need to think about which premises go together to support which conclusions. The goal here will not to be to make these subarguments (parents and their immediate children) valid, but they should be put together so as to be as strong as possible, given the propositions actually in hand. If they are not, you have done a bad job.
Argument mapping has become more popular as computer tools for doing it have become available. For example, Tim van Gelder’s Rationale is widely used in teaching critical thinking. A good alternative, especially once the basics of argument analysis and mapping have been learned, is to use Bayesian networks for laying out arguments and assessing their merits. Bayesian networks have the distinct advantage over pure mapping tools that they can reflect the degrees of strength that premises confer on conclusions. Netica would be a good place to start investigating Bayesian nets for argument analysis, having a relatively friendly GUI; although there is a licence fee, the free download can be used for small maps without any licence.
4. Make it Valid.
This is perhaps the most interesting and challenging of the steps. First, however, a qualification: many arguments are intrinsically or intentionally inductive (probabilistic). Their premises are meant to make a conclusion probable, and not certain. For a trivial example consider a classic enumerative induction: In our history the sun has risen every morning; therefore, tomorrow the sun also rises. There is no certainty, but plenty of probability. Good inductive arguments are already good arguments and don’t need to be made valid. Of course, inductive arguments can also be rendered valid, for you can always add a premise such as “Those things which are probable are true.” But that is really a pointless step. You may as well simply make it as good an inductive argument as you can.
To make each subargument valid, enthymemes (hidden premises) will need to be found and filled in. They shall have to be sufficient to render the conclusion necessarily true, given all of its premises (which is what a valid argument is). In general, they should not be more than sufficient. That is, you should not be building a “Straw Man” — an argument which asserts far more than what its author meant. For example, if an argument’s validity requires that some boat people are terrorists, you wouldn’t want to fill in as hidden premise the assertion that all boat people are terrorists.
This is often called the Principle of Charity. To be sure, charity can be taken too far; for example, if the argument already states that all boat people are terrorists, then, even if the argument doesn’t need it, a fair presentation will include it. Attending to what the author wrote, what the author meant, what the author implied or connoted by what was written, and what the author thought was meant, are all a part of filling in the argument.
Counterexampling is a key technique for this step: imagine some possible world where all of the stated premises are true and yet somehow, perhaps amazingly, the conclusion is false. That is a possible world demonstrating that the argument is not yet valid. You need to add some premise which will make the conclusion false and try again. For example, suppose someone asserts that Gertie, being a swan, must be white. Then we should try to imagine a possible world that includes black swans (that doesn’t take much imagination, since we live in such a possible world:-), in order to note that the argument is assuming that all swans are white.
Having the best version of the argument that we can produce before us, we should now criticize it. Since it is now valid, we are hardly going to be able to criticize the inferential steps. But what we have done in making it valid is to expose all of the argument’s weaknesses in its premises, if there are any weaknesses. So we can now canvass the premises, new (hidden) and old (explicit), for those we might find implausible. Generally, arguers prefer to hide their arguments’ weaknesses, consciously or not, and so the implausibilities will be found in the previously hidden premises, now exposed. We can follow our judgments of implausibility, hunting down and constructing the best arguments we can against those premises, applying the Seven Steps recursively as often as we may need. The result of this step should be a pretty thorough accounting of the merits and demerits of the argument.
6. Consider Alternatives.
If you want a good understanding of the issue at hand, then you will need to survey the relevant literature at least for the main alternative points of view and run them through the first five steps as well. That said, the end of Step 5 is a natural stopping point: the argument may already be assessed on its own terms. If you have built a Bayesian network for it, for example, you may be able to assess precisely the weight given to its conclusion by its premises.1 If you extend the network to a Bayesian decision network with actions and utilities, you will be able to assess relevant actions or interventions, as well.
While you can stop with Step 5, gathering alternative arguments and mapping them should not be considered frosting on the cake. Even though an argument in isolation can be assessed on its own merits, considering the alternatives, especially those put by those who are your ideological contraries, will often lead to a reassessment of your analytical work to this point. Indeed, if you are an open-minded Fox, perhaps they will typically lead to a reassessment.
With multiple argument maps in front of you — or better still, Bayesian networks modeling the main relevant arguments — you can interrogate them to find which conclusions are genuinely supported by the available premises. The premises by this time should include only those which are themselves reasonably well justified. In particular, the weak premises from the initial argument should have been recursively exposed as weak by you having drilled deeper than them (so they are no longer premises); hence, they should no longer be conferring any phony support on the original conclusion.
I shall be illustrating the Seven Step program, or the products of it, on this blog with many arguments. I rather enjoy ripping the common arguments in political speech to shreds.
1 Of course, the precise probabilities entered into a Bayesian network representing an argument may or may not be read over-precisely. If they are rough estimates — corresponding, say, to high, medium and low — then you should just be using the network to assess the more general aspects of the argument, rather than precise probabilities. Bayesian nets can be used to reason Bayesianly, whether or not the probabilities are precise, contrary to canards presented by some anti-Bayesians.
In this first substantive post I shall sketch out what I am after in general terms: the use of good arguments to further our understanding. Bad arguments dominate public debate. Most media commentators and politicians indulge regularly in them. How could they not? If there is a better understanding of what good and bad arguments are, then they might not, of their own volition, and the public might also compel them to lift their game.
Critical thinking1 is just as regularly endorsed as a central theme of education. Our students should leave school, or university, with the ability to think for themselves, with tools for critically analysing and assessing complicated arguments, and the ability to avoid being seduced by the many dread Fallacies. They are thought to be aided in this by being able to spot and identify examples of the many species of Fallacies. The result is the widespread abuse of people and their arguments for being “fallacious” — itself often a kind of argument ad baculum! (i.e., a form of bullying.) I shall have more to say about the fallacies on other occasions; indeed, since fallacies are frequently and abusively identified in perfectly good arguments, for the purpose of bad-mouthing them, I shall endeavour to unpick and expose many of them as perfectly good arguments in the future. But first we shall have to decide what good and bad arguments are.
As Monty Python famously pronounced, “An argument is a connected series of statements intended to establish a proposition.” There’s not much dispute about that definition, but it tells us little about how to distinguish good from bad arguments. The traditional account of the goodness of arguments (the “alethic”, or truth-oriented, account) , taught for many years in philosophy departments, has been that good arguments are those with true premises and valid inferences (a “sound” argument). Validity refers to arguments that are so strong that their premises necessitate their conclusion: if their premises are true, it is impossible for their conclusions to be false. On this definition, a good argument is at least a pretty good thing — it guarantees that you arrive at the truth!2 It is not, however, good enough.
As Charles Hamblin, and others, pointed out there are pragmatic and rhetorical aspects to a good argument (Hamblin, 1970). An argument is not as good as possible if it fails to persuade its intended audience. While persuasive power is clearly insufficient as a mark of a good argument — at least so long as we refuse to acknowledge the arguments of Hitler and Mussolini as good — it is also necessary that they have some persuasive effect. Arguers need to attend to their audiences. Indeed, it is incumbent upon them to understand the cultural background and presuppositions of their audiences so as to frame their arguments. Amongst other things, arguments need to be grounded in premises that will be accepted by both parties. Good argument, therefore, necessarily engages one in considerations of practical psychology, sociology, culture and pragmatics, at least to some basic level. I will discuss some of these issues, including audience analysis, in future posts.
Another failing of the “true premises” test is that it is possible simply to luck into true premises, but intuitively good arguments are not lucked into. Instead we naturally expect good premises to be responsibly sourced. That is, either we obtain them from a recognized authority, or a reliable witness, or we have determined their truth for ourselves and can testify to them ourselves. This speaks to the normative side of judging arguments: they must be both persuasive in fact and rationally persuasive. The final ingredient is one already proposed in the alethic account, that of validity.
I suggest then that a good argument is: (1) one that persuades its target audience; (2) draws only upon acceptable premises — those that are themselves drawn from a reliable source; and (3) whose premises validly imply its conclusion.
I shall be demonstrating just how far short of this standard many of our political and public policy debates fall.
1 For more on critical thinking I strongly recommend the web pages of Tim van Gelder:
His critical thinking blog provides a great many useful explanations and provides links to a host of related resources around the web.
TvG’s argument mapping page describes the use of “maps” to understand arguments and leads to his computer program Rationale. (I’ve just noticed an argument mapping freeware alternative, Argumentative, at Sourceforge; I’ll have a look at it sometime.)
For reasons not entirely clear to me TvG has had more success than anyone else at teaching critical thinking. His advice on teaching it is worth a look, since it necessarily also provides good advice on how to do it.
2 A simple adaptation for inductive arguments replaces necessitation with probabilification, i.e., rendering the conclusions probable.