Born Inquisitive
A blog of independent thinking and evidence-based inquiry

Anecdotes Are Not Evidence

January 31, 2021
7,817 words (~39 minutes)
Tags: fallacies statistics psychology third-person

Anecdotal evidence has flaws that make it useless for inferences about populations or about cause and effect. Valid uses of anecdotes exist, but anecdotes should be scrutinized even in such cases.

Table of Contents

Introduction

If enough time is spent around those doing empirical inquiry, one is liable to encounter the maxim “anecdotes are not evidence.” Anecdotes are stories that consist either of one’s own personal experiences or the retelling of the personal experiences of others. The maxim “anecdotes are not evidence” is a healthy reminder that the use of anecdotes as evidence has numerous flaws.

However, the maxim “anecdotes are not evidence,” like all slogans, is a short, catchy phrase and is devoid of any substantially informative content. While such a catch phrase can be used as a mnemonic device to recall deeper understanding, it can also be used without any cognizance of the issues to which it alludes. When used in this latter way, the maxim can be interpreted in ways that lead to implications that are false.

One such false implication that stems from an extremely literal interpretation of the maxim “anecdotes are not evidence” is that nothing at all can be learned from anecdotes.

Such a literal interpretation can be used to undermine the maxim itself. For instance, one could note a case in which a serial sex offender was caught due to the testimony of victims, which is necessarily anecdotal, and conclude that those who subscribe to “anecdotes are not evidence” would throw out this testimony, setting the sex offender free. This is such a morally repugnant proposition for so many that it could be used as a way to discredit those who are fond of the maxim “anecdotes are not evidence.”

This literal interpretation can also be used as an obstacle to understanding. For instance, if one hears stories from friends that a certain road is congested with traffic during rush hour, one might dismiss these stories, citing “anecdotes are not evidence.” One would have no one to blame but oneself when one finds oneself stuck in traffic on the very same road, waiting for the rush hour congestion to clear.

Bumper-to-bumper cars on a highway with brake lights visible.
Congested traffic during rush hour, which can sometimes be avoided by way of anecdotes.
“Rush Hour” by MSVG is licensed under CC BY 2.0 and is not modified.

Surely, because of these implications, this overly literal interpretation does not capture whatever utility the maxim “anecdotes are not evidence” has. This is a flaw not just with this particular maxim, but with slogans in general, which threaten to become shibboleths when used incognizantly in this way.

The purpose of this article is to articulate the issues in using anecdotes as evidence in a more substantial way than can be done with a pithy maxim. It begins by discussing the sort of inference for which anecdotes are entirely useless; this is balanced with later discussion of valid uses of anecdotes.

Misuses of Anecdotes

Anecdotal evidence is useless for two kinds of inference: inference about a larger population and inference about cause and effect.

Inference about a Larger Population

While some number of individuals that can be directly observed might sometimes be of interest, interesting questions often pertain to a larger population than those that are directly observed. For instance, an epidemiologist is usually professionally interested in how many people contracted the seasonal flu in a given geographic area, not how many of the epidemiologist’s friends contracted the seasonal flu. This is a question about a larger population, i.e., all of the people in the given geographic area, that is based on a smaller, representative sample, e.g., surveys, hospital records, etc. Anecdotes shed no light on these kinds of questions.

Inference about Cause and Effect

Practical questions often pertain to whether or not something causes an effect and the size of such an effect. For instance, in clinical trials, a potential therapy is given to volunteers in order to determine whether or not the therapy causes an improvement in a disease or an increase in survival time. It is not enough to know how many people receiving the treatment improve or to what extent they improve. Indeed, there are many diseases from which people will often recover of their own accord. Rather, the question that is at issue in a clinical trial is whether or not the potential therapy caused any measured improvement and what the size of this effect was. Again, anecdotes are no help in answering this kind of question.

An African man wearing a lab coat and looking through a microscope.
A person looking through a microscope while doing work as part of a clinical trial, for which anecdotal evidence is not useful.
“GSK PULSE staff working at the GSK-sponsored Kombewa malaria clinical trial site in western Kenya” by GlaxoSmithKline is licensed under CC BY-NC 2.0 and is not modified.

Flaws of Anecdotes as Evidence

Flaws in using anecdotes as a kind of evidence can broadly be grouped into two categories: statistical and psychological. The statistical flaws are what specifically preclude anecdotes from being useful for inference about a larger population and for inference about cause and effect, but do not prevent anecdotes from being useful in other ways. The psychological flaws affect anecdotal evidence no matter what the use, and so these flaws must be acknowledged and accepted during the valid uses of anecdotes articulated later.

Statistical Flaws

Insufficient Sample Size

Returning to the example of the seasonal flu, suppose one is interested in how effective vaccination was in preventing individuals from contracting the flu in the United States during the 2019-2020 flu season. Suppose there are 10 friends and family that one keeps in touch with regularly. Of these 10, 6 were vaccinated against the flu, and 1 caught the flu this year. The 1 who caught the flu was not vaccinated. What estimate should one make of how effective this year’s vaccine was?

A rough estimate for the probability of catching the flu when not vaccinated could be 1 / 4 = 25%. But what about the probability that someone who was vaccinated caught the flu? None of one’s friends and family that were vaccinated caught the flu this year, so continuing this method of rough estimation leads to an estimate of 0 / 6 = 0%.

Thus, one might be tempted to say that, given these observations, this year’s flu vaccine was absolutely effective. However, this is mistaken. Flu vaccines are effective at decreasing the risk that a vaccinated individual has in catching the flu, but they do not decrease the risk to zero. (“Vaccine Effectiveness: How Well Do the Flu Vaccines Work?” 2020)

The issue this example highlights is that anecdotal evidence rarely consists of enough observations to make an inference to a larger population or to make an inference about cause and effect. Given a sample size of only 10, it is more likely than not that there would not be a single individual in the sample who both was vaccinated and who caught the flu.1 However, in a larger sample of, for instance, 1,000 individuals, there would be many individuals who were vaccinated and who caught the flu.2

In this case, the anecdotal evidence of one’s friends and family does not consist of a large enough number of observations to calculate even a crude estimate of vaccine efficacy.3

Furthermore, note that any estimate of the probability of catching the flu based on these anecdotes would be prone to wild fluctuations. If another unvaccinated friend happened to catch the flu during the 2019-2020 season, the estimated probability of unvaccinated individuals catching the flu would jump from 25% to 2 / 4 = 50%. This small change in the sample would result in a large change in the estimate about the larger population, which is an indication that the inference is susceptible to being inaccurate due to chance.

The preceding example is generous inasmuch as at least several individuals were observed. Many times, anecdotes consist of a sample size of just 1.

For instance, one might hear from a friend, “I started the Acme Juice Boost program, and ever since then I haven’t caught a cold or flu or anything.” What sort of estimate could be inferred from this? With a sample size of 1, there are two possibilities: that 0 out of 1 individual experiences the outcome or that 1 out of 1 individual experiences the outcome. Given what has been seen about preventative measures against communicable disease, estimating efficacy at 100% is quite reckless. In this example, one cannot infer from a sample size of 1 how effective the Acme Juice Boost is at preventing communicable disease or even if has any effect at all, since many people do not catch cold or flu for finite amounts of time.

Lack of Comparisons

An anecdote that comprises a sample size of 1 brings with it another issue, one that can occur even with larger sample sizes: often times anecdotes do not include any observations to compare against. For instance, suppose that one learns that several of one’s friends use Acme Snake Oil pills whenever they catch the common cold. These friends relate how after taking Acme Snake Oil, one of them recovered from the cold after 3 days, another after 7 days, and a third after 2 days. An estimate of the average time to recover from a cold for someone taking Acme Snake Oil is therefore (3 + 7 + 2) / 3 = 4 days.

The question remains: do Acme Snake Oil pills improve cold recovery times? In order to answer this question, one would need an estimate of how many days it takes for people not taking Acme Snake Oil to recover from the common cold. Only after comparing these two average times can one evaluate the effect, if there is any, of taking Acme Snake Oil. Unfortunately, in this example, the anecdotes of one’s friends do not come with any such information in order to make a comparison.

This issue can be exploited by those trying to sell something. Marketing testimonials often come with success stories of people who have used a product and had a desired outcome. However, such testimonials can easily leave out anecdotes about both those who have had unfavorable outcomes when using the product and those who had the favorable outcome without using the product. Anecdotes used in this way are an instance of the fallacy of listing examples, which is discussed in its own article.

Unrepresentative Samples

Perhaps there is an individual so popular that polling the individual’s friends and family results in a large enough number of observations for inference about a larger population and includes observations to make comparisons. Even if this were the case, another issue remains that prevents such an anecdotal sample from being useful. The sample created from such polling is not representative of a larger population. Instead, the sample is heavily biased towards the idiosyncrasies of the individual in question.

Returning to the example of flu vaccine efficacy in the United States during the 2019-2020 season, suppose that a particularly extroverted individual keeps in regular contact with not just 10, but 200 friends and family. However, suppose the extroverted individual caught the flu during the 2019-2020 season.

In this case, the sample might be large enough for estimation of vaccine efficacy, but this sample will be biased because the extroverted individual was in contact with the 200 other people in the sample and so exposed them to the flu virus. Therefore, the prevalence of flu illness in this sample is expected to be higher than in the general population, which includes individuals who had no direct exposure to the flu virus. Because of this, any estimations regarding the size of the protective effect caused by the vaccine will be incorrect.

This phenomenon is likely to occur whenever sampling is based on personal associations from one individual. There is a similar but opposite issue when starting with an extroverted individual that did not have the flu. The sample created from this individual’s personal associations will likely under-represent, rather than over-represent, the incidence of flu when compared with the general population. This is a result of the fact that the starting individual is known not to have had the flu and to have been in contact with all the other individuals in the sample, and so there is an indication that there is a lack of contagious individuals in the sample.4

Fallacy of Celebrated Cases

The situation is even worse for anecdotes that come not from personal associations, but by way of the news media. Much of what qualifies a story as news is that it is rare, unusual, or shocking enough to be of interest. Thus, news stories collectively form an intentionally unrepresentative sample of what is happening in a larger population. This phenomenon was illustrated dramatically by the media frenzy around shark attacks in 2001.

While shark attacks on humans can result in a personal tragedy of injury or death for those involved that should neither be minimized nor made light of, the media representation of shark attacks in 2001 was misleading and irresponsible. A particularly graphic shark attack case was covered on major news media outlets in August of 2001. This was soon followed by story after story of sharks attacking humans. Time magazine ran a cover story labeling the summer of 2001 the “Summer of the Shark.”

Time magazine cover prominently displaying a shark in turbulent water with its head peeking above the surface and its mouth open.
The July 30, 2001 cover of Time magazine, which labeled the summer of 2001 the “Summer of the Shark,” labeling an incident which has since become a case study in bad journalism.

This created an impression that shark attacks were increasing at an alarming rate during 2001 and that shark attacks were a particularly prevalent risk. However, it was soon discovered that 2001 had only an average number of shark attacks; indeed, the number of shark attacks and number of fatalities from shark attacks was actually less in 2001 than in 2000. (Summer Of The Shark In 2001 More Hype Than Fact, New Numbers Show 2002)

The so-called “Summer of the Shark” was not a sudden and pronounced increase in shark attacks, but a sudden and pronounced increase in news coverage of shark attacks. However, shark attacks are infrequent compared with other risks humans expose themselves to on a daily basis, so this spike in news coverage encouraged unsubstantiated fears. Nonetheless, the sensational coverage of shark attacks during 2001 did not end until the September 11, 2001 terrorist attacks took over the news cycle. Since then, the so-called “Summer of the Shark” has become an object lesson in bad journalism. (Dempsey 2016)

This alludes to a broader fallacy than just bad journalism. Whenever one is attempting to understand a large population – for instance, in order to make judgments about a society or come up public policy ideas – basing such understanding exclusively on stories that get the most attention is foolish. Such stories form a highly anecdotal, unrepresentative sample that are not informative about the population at large. Indeed, many such stories receive attention precisely because they are so extreme. This can be labeled the “fallacy of celebrated cases.”

Summary

Using anecdotes as evidence is flawed in three ways pertaining to statistical inference: anecdotes constitute sample sizes that are prone to wild fluctuations and are often too small to make precise estimates; anecdotes often come without any observations for comparison; and anecdotes form unrepresentative samples that bias any estimation based on them. For these reasons, anecdotes are not at all useful either for conclusions about a larger population or for conclusions about cause and effect.

In addition to these concerns, there are other flaws in anecdotal evidence pertaining to how human minds work.

Psychological Flaws

Contrary to popular belief, human memory does not “work like a video camera, accurately recording the events we see and hear so that we can review and inspect them later,” as 63% of respondents in a demographically representative sample of 1,500 people in the United States believed in a recent telephone poll. (Simons and Chabris 2011) Instead, human memory is flawed. People often fail to remember true things about past experiences or believe they remember false things about past experiences. Furthermore, the social experience of retelling memories can influence how past events are remembered.

A close-up of a video camera at an event, with camera in focus and the background blurred.
A video camera, which 63% of respondents in a recent poll incorrectly believed represents how memory works.

Flaws of Memory

The psychological literature around the topic of memory is quite large, and a full review of what has been learned about the fallibility of memory recall is beyond the scope of this article. Instead, this article summarizes a few seminal experiments in the field in order to establish the core point that memory recall, according to the empirical evidence, is quite fallible.

Misinformation Effect

The work of psychologist Elizabeth F. Loftus is a particularly conspicuous part of the body of research on memory fallibility as it relates to anecdotal evidence, in part because much of this work has specifically focused on issues with witness testimony and because of her engagement with the legal system, testifying as an expert witness in numerous high-profile cases.

An early experiment of hers examined 45 students at the University of Washington who were shown the same video of a traffic accident. (Elizabeth F. Loftus and Palmer 1974) The test subjects were then asked to estimate how fast the cars were going at the time of the accident, but the question was phrased using different words such as “smashed,” “collided,” “bumped,” “hit,” or “contacted.” The mean estimated speed, averaged over different test subjects, for those asked with the word “smashed” was 40.8 miles per hour, whereas the mean estimated speed was only 31.8 miles per hour for those asked with the word “contacted.”

More remarkably, in a follow-up, similar experiment with 150 student participants, the word choice of “smashed” versus “hit” appeared to affect whether a detail was falsely remembered. The 150 test subjects were shown a video of a multiple car traffic accident. After watching the video, 50 test subjects were given a questionnaire that included the question “About how fast were the cars going when they smashed into each other?” Another 50 test subjects were given a similar questionnaire that asked how fast the cars were going when they “hit” each other, and a control group of another 50 test subjects were not asked about the speed of the cars.

All the test subjects were given a second questionnaire 1 week later in which they were asked whether they recalled seeing broken glass. There was no broken glass in the original video. Among those who were asked the “smashed” question, 16 of the 50 (32%) replied “yes” to seeing broken glass, whereas only 7 out of 50 (14%) of those asked the “hit” question and 6 out of 50 (12%) of the control group reported seeing broken glass.

This effect of an increased probability of incorrectly remembering broken glass was mediated in part by the greater speed estimates of the “smashed” group, but there was still an effect that averaged to 12 percentage points independent of speed estimate.

Shards of glass of various sizes lying on a sidewalk.
Broken glass, which was a falsely remembered detail measured in an experiment by Loftus and Palmer (1974).

This phenomenon in which subjects’ subsequent recall could be modified was further investigated with several experiments that focused on the effect of leading questions. (Elizabeth F. Loftus 1975) In one experiment, 150 students at the University of Washington were shown another video of an automobile accident. The test subjects were given a questionnaire immediately following watching the video, in which half were given the question “How fast was the white sports car going when it passed the barn while traveling down the country road?” while half were given a similar question that did not mention a barn. There was, in fact, no barn in the video.

A week later, the test subjects were given another questionnaire that included the question “Did you see a barn?” Of the ones who had a variant of the earlier questionnaire that mentioned a barn, 13 out of 75 (17.3%) replied that they had seen a barn, whereas only 2 out of 75 (2.7%) whose earlier questionnaire did not mention a barn reported having seen a barn on the later questionnaire.

A wood barn with a red roof in the countryside next to a small gravel and dirt road.
A barn next to a road, which was a falsely remembered detail measured in an experiment by Loftus (1975).

It is one thing to recall false details such as broken glass or a barn that was not originally there, but this research would go on to show that false memories of an entire experience that did not happen could be inculcated in test subjects.

In a one such experiment, 24 pairs of volunteers ranging in age from 18 to 53 years were recruited by University of Washington students. (Elizabeth F. Loftus and Pickrell 1995) Each pair had an older “relative” volunteer who had knowledge of the younger “subject” volunteer. The subjects were given a booklet that contained 4 stories from their childhood experience. While all of the stories were based on knowledge provided by the relatives, 3 of the stories were true, and 1 was a false story about getting lost while on a shopping trip.

The booklet contained a cover letter that instructed subjects to write what they remembered about the events in the space provided or to indicate they did not remember the event if they did not recall it. Later, the subjects were interviewed twice, with 1 to 2 weeks before each interview, and again asked to provide whatever details they could recall about each event, but during the interviews they were also asked to score the clarity of their memories on a scale from 1 to 10.

Subjects remembered 49 out of 72 (68%) of the total true events throughout the experiment. During the booklet phase of the experiment, 7 out of 24 (29%) of the total false events were remembered as having occurred, but 1 test subject changed her answer during the interview phase, decreasing this total to 6 out of 24 (25%).

The mean clarity rating was higher for the true events (6.3) than for the false events (2.8) during the first interview. However, during the second interview, mean clarity remained roughly the same for true events (6.3), but increased for the false events (3.6).

Even after being told that 1 of the 4 events was fabricated, 5 out of the 24 (21%) identified a true event as the story they believed was invented instead of the actual false event.

The results of this and other similar experiments were timely in the 1990s because of criminal prosecutions based on individuals allegedly recalling repressed memories of childhood abuse, memories that were only discovered after extensive therapy sessions. Such experiments showed how such prosecutions could be based on false memories inculcated by therapists inadvertently using similar methods as in the experiments. (Elizabeth F. Loftus 1993)

Word Lists

While the research of Loftus and others had direct implications for the courtroom, this line of research focused on memories affected by misinformation encountered after the original experience. (Elizabeth F. Loftus 2005) Other lines of research explored false recall of memories not tainted in this way.

One tool useful for this purpose are Deese-Roediger-McDermott (DRM) lists. These lists consist of some number of words that are all associated with a critical word that does not appear on the list. For instance, a DRM list might consist of the words “thread,” “pin,” “eye,” “sewing,” “sharp,” “point,” “pricked,” “thimble,” “haystack,” “pain,” “hurt,” and “injection.” The critical unpresented word for this list is “needle.”

In one experiment, DRM lists were prepared with 15 words each, and 14 lists were used in recall and recognition exercises with 30 undergraduate students at Rice University.5 (Roediger and McDermott 1995) For every list, each word was spoken out loud from a tape recording at a rate of about 1 word every 1.5 seconds. After each list was read aloud, half the test subjects did a free recall exercise writing down as many words as they could remember, and half the test subjects did math problems as a filler exercise. At the end of the exercises, each test subject had done the free recall exercise for 7 lists and had listened to but not done a free recall exercise for 7 lists.

Finally, the test subjects were given a recognition test that consisted of a written list of 96 words – 48 of which had been presented to them during the experiment – and asked to identify which words they had encountered and which they had not. The 48 unpresented words on the recognition test included the 14 critical unpresented words. For words that the test subjects identified as presented, they were additionally asked about their phenomenological experience: words were labeled “remembered” if the test subject could mentally relive the experience of encountering the word, and words were labeled “known” if not.

During the free recall, test subjects recalled 62% of the words presented to them. They mistakenly recalled the critical unpresented words 55% of the time. This is even more remarkable because the rate at which presented words were recalled varied depending on their position on the lists, and presented words in positions 4 through 11 were recalled 47% of the time, a rate even lower than the critical unpresented words.

During the recognition test, the critical unpresented words were falsely recognized 72% of the time for lists for which a free recall had not been done, and 81% of the time for lists for which a free recall had been done. About 53% of the critical unpresented words for non-recall lists were described as remembered rather than known, and 72% of the critical unpresented words for recall lists were identified as remembered.

This experiment demonstrated that even without misinformation contaminating memory, human beings can be subject to systematic and measurable false recall. Furthermore, it suggested that all memory, even exercises as rote as remembering lists of words, is reconstructive, leveraging preexisting schemas of association already present in a mind before an experience is had. Finally, even in such a banal setting, a nontrivial proportion of false recall involved phenomenological remembering rather than knowing, in which test subjects relived an experience that never actually occurred.

Flaws of Serial Reproduction

Anecdotes are not just recalled from memory and told to one other person. Sometimes the person told an anecdote retells it to yet another person. Indeed, by the time one encounters it, an anecdote may have been retold any number of times. This is a situation not unlike a children’s game called “telephone” in the United States and by many other names in other parts of the world.

Four children in a classroom, with one child whispering into the ear of another.
A group of children sitting in a row playing the “telephone” game.
Pressmaster/Shutterstock.com

In this game, one child is told a story. This child is then tasked with repeating it exclusively to a second child, often by whispering. The second child repeats it only to a third child, and so on, until a final child is told the story. At the end, the story that the final child tells is compared with the original story. The amusement of the game comes in discovering how the final version of the story has been mangled and altered from its original version.

In the psychological literature, this phenomenon is called “serial reproduction.” This is in contrast to what is called “repeated reproduction” in which the same individual recalls a memory again and again. Several experiments throughout the twentieth century confirmed the moral of the telephone game, i.e., that serial reproduction is error prone and distorts information. More recently, a study quantified how much more error prone serial reproduction can be compared to repeated reproduction. (Roediger et al. 2014)

In this experiment, 60 undergraduate students at Washington University were presented with DRM word lists by way of computers, asked to do multiplication problems for 30 seconds, then prompted to recall the DRM word list.6 For the repeated reproduction portion of the experiment, a test subject was simply asked to recall the original DRM word list 4 times in this manner. For the serial reproduction portion of the experiment, the list recalled by one test subject was given to another test subject. These serially reproduced lists were passed along 4 different test subjects, with the first given the original DRM list and the next 3 given the previous test subject’s recalled list.

The mean proportion of words correctly recalled from the original DRM word lists, averaged across test subjects, remained relatively constant around 50% during repeated reproduction. However, the mean proportion of words correctly recalled declined at each step in serial reproduction, starting around 50% and ending up at just above 20% after 4 iterations. Furthermore, the mean proportion of recalled words that were actually falsely recalled critical unpresented words remained relatively constant around 5% during repeated reproduction, but increased at each step during serial reproduction, winding up at over 10% after 4 iterations.

What is interesting about the worse performance of serial reproduction in this experiment is that at each step in the serial reproduction, test subjects tended to perform better than in the previous step, inasmuch as they were recalling a greater proportion of the lists that they had been presented. This should not be surprising as each step in serial reproduction tended to start with a shorter list than the previous, and it is generally easier to remember less things.

What caused the overall worse performance of serial reproduction compared with repeated reproduction in this experiment was that each step in serial reproduction carried over the mistakes from the previous steps. Once a word that really did appear on the original DRM world list was forgotten, it would not appear on the input list for subsequent serial steps and thus be lost for the rest of the process. Once a critical unpresented word was falsely recalled, it would be put on the input list for the subsequent serial step and so be indistinguishable from words that actually appeared on the original DRM list.

This highlights a danger in the retelling of anecdotes. Specifically, any errors of recall are compounded as anecdotes are told and retold along a sequence of individuals.

Implications

The implications of these psychological flaws for anecdotal evidence are straightforward: human memory is not like a video camera, accurately recording events. Because memory is demonstrably fallible, no anecdote of any given experience told from memory can be blindly trusted. Instead, all such anecdotes ought to be scrutinized. Furthermore, because of the compounding effects of serial reproduction, anecdotes told secondhand ought to be given even more scrutiny.

However, the evidence does not imply that all anecdotes must be thrown out completely, either. While 29 out of 150 test subjects reported seeing broken glass when there was none, 121 did not. (Elizabeth F. Loftus and Palmer 1974) While 55% of critical unpresented words on DRM lists were falsely recalled, 45% were not. (Roediger and McDermott 1995) Thus, in order to separate truth from falsity in encounters with anecdotes, one can neither accept an anecdote without scrutiny nor arbitrarily reject an anecdote without scrutiny.

Furthermore, such scrutiny does not imply that the person telling the anecdote is “crazy,” “hysterical,” or mentally ill. The test subjects in most of these memory experiments were taken from populations of healthy adults who functioned at a high enough level at least to be attending an institution of higher education. Indeed, even if turns out a given person’s anecdote is false, that does not imply the individual is “crazy.” Rather, it only implies that the person has a normally functioning human brain that, like all other human brains, is imperfect in its memory function.

Such scrutiny also does not necessarily imply suspicion of deception or malicious intent in the person telling the anecdote. The test subjects in these memory experiments had no incentive to lie. Intentional deception is just one possible reason for falsity in anecdotes. The manipulations done in these experiments to induce effects of increased false recall illustrate other reasons, such as the misinformation effect or the effect of leveraging preexisting mental associations.

Therefore, the evidence on psychological flaws is such that all anecdotes ought to be scrutinized and that enlightened individuals ought not to take offense at such scrutiny. Indeed, if one were truly interested in the truth being known and were aware of the evidence on memory fallibility, one would welcome such scrutiny.

Uses of Anecdotes

Earlier it was seen that because anecdotes constitute samples of insufficient size or unrepresentative samples or both, anecdotes are not useful for drawing conclusions about larger populations or about cause and effect. However, this does not preclude other, valid uses of anecdotes.

Investigation of Specific, Singular Incident

Sometimes, one is interested in the details of a single incident. When investigating a specific incident, most of the evidence that is available tends to be anecdotal. Thus, while anecdotes are useless for broader questions about a larger population or for questions about cause and effect, they are often the main sort of evidence for questions about a specific, singular incident.

For instance, if one were interested in how frequently theft occurs in the country in which one lives, it would be foolish to investigate this question by asking one’s neighbors if any of their possessions had been stolen. Such anecdotal evidence would provide no insight to this question about a larger population. However, if one has discovered that one’s bicycle has gone missing from the Springfield Apartments parking garage on the afternoon of Tuesday, January 19, 2021, then it would be prudent to ask Springfield Apartments residents if they saw any suspicious activity that afternoon. In this latter case, anecdotal evidence is very relevant for questions about a specific, singular incident.

A bike rack attached to a brick wall, with a bicycle wheel fastened to the bike rack with a chain lock.
The wheel that was left behind from a bicycle that has been stolen.

While anecdotal evidence is, out of necessity, relevant for investigation of a specific incident, anecdotes still come with all the psychological flaws of memory and of serial reproduction examined above. Therefore, it is important that the anecdotes be scrutinized. An example of a human activity that grapples with these issues is the court system.

Many courts do not allow hearsay to be admitted as evidence. For instance, the United States Courts (and the courts of many of the several states of the United States) use the Federal Rules of Evidence, which specifically identify hearsay as inadmissible in court. According to it, “hearsay” is defined as a statement made not while testifying at the current trial and offered into evidence to prove the truth of the matter asserted in the statement. (“Article VIII - Hearsay 2021)

Hearsay is anecdotal. However, other forms of anecdotal evidence, such as witness testimony, are admissible according to the Federal Rules of Evidence. (“Article VI - Witnesses 2021) What sets hearsay apart is that hearsay constitutes anecdotes that would be entered into evidence without any scrutiny. Granted, how effective the century-old traditions of the courts actually are at scrutinizing witnesses’ anecdotes is a question in and of itself.7 Regardless, the court system at least recognizes that anecdotes require some kind of scrutiny.

Hearsay is not just anecdotal, but usually comprises anecdotes told secondhand.8 Thus, not only does hearsay constitute a kind of anecdote that avoids scrutiny, but it is a form of anecdote told via serial reproduction, which was seen above to be even more error prone than direct memory recall and so deserving of even more scrutiny.

Courts are by no means perfect arbiters of truth and falsity, but the procedures established in the courts at least illustrate the basic principles of handling anecdotal evidence. In questions about specific, singular incidents, anecdotal evidence is relevant, so witness testimony is allowed. However, anecdotal evidence – especially when delivered via serial reproduction – must be scrutinized, so anecdotes that avoid scrutiny such as hearsay are inadmissible.

Counterexample to Universal Assertion

Another, more trivial example of a valid use of anecdotal evidence is in providing a counterexample to a universal assertion. Universal assertions are statements that claim all of some population have some property or none of some population have a property. This is a trivial usage of anecdotal evidence because universal assertions are easily disproven. All that is required is that a single individual in the population contradict the universal assertion for the assertion to be false.

For example, if it is claimed that all swans are white, witnessing a single non-white swan is sufficient to demonstrate that the universal assertion “all swans are white” is false. Stories of seeing a black swan thus form a counterexample to this universal assertion, even though they constitute anecdotal evidence.

Like in investigation of a specific, singular incident, even though anecdotal evidence may be relevant in finding counterexamples to universal assertions, they are still subject to all the psychological flaws of memory, flaws which can be exacerbated by serial reproduction. Thus, the anecdotes ought to be scrutinized, and it is better to get anecdotes directly from the witnesses themselves rather than secondhand. Fortunately, when looking for counterexamples to universal assertions, it is typically easy to find corroborating physical evidence. For instance, as a counterexample to the assertion “all swans are white,” a picture of a black swan goes a long way.

A swan with a red beak and black weathers floating water.
A black swan, the existence of which contradicts the universal assertion “all swans are white.”

Precipitation of Further Investigation

While anecdotes themselves are not useful for drawing conclusions regarding a larger population or regarding cause and effect, they can be useful in prompting a search for better evidence.

Returning to an example from the introduction, if one hears stories from friends that a certain road is congested with traffic during rush hour, one can rightfully identify these stories as anecdotal evidence such that no conclusion can be inferred from them regarding the larger population of traffic conditions on the road over time. It would be foolish to avoid using the road based on these stories alone. For instance, one might be commuting at a different time of day than one’s friends or one’s friends might be mistakenly remembering a different road. Drawing a conclusion from anecdotal evidence could lead one to avoid a road that might improve one’s daily commute.

However, it would be equally foolish not to investigate the issue further before including the road as part of one’s daily commute. This is especially true in modern times in which there are sources available of systematic, non-anecdotal data regarding traffic patterns available to the general public. If the anecdotal evidence happens to be consistent with a larger trend, then one could find oneself stuck in traffic that one could have avoided if one looked into the matter further.

This illustrates a use of anecdotes as a precipitation of further investigation. While the traffic example did not have broader implications for other individuals, anecdotal evidence precipitating further investigation is an important phenomenon for those in positions of authority to set policy, such as executives in a company or administrators of a school.

For instance, if stories began circulating around a company or school about various assaults or other criminal behavior occurring, it would be glib and foolish to dismiss these stories while quoting “anecdotes are not evidence” and do nothing more. However, it would also be foolish to attempt to set policy based on the details of whatever anecdote happened to come down the chain of serial reproduction to one’s attention. Rather, the prudent thing to do would be to investigate matters more rigorously, setting policy based on information about the whole population of individuals affected by the policy.

Illustrative Example

Finally, it is not strictly fallacious to include an anecdotal story with the presentation of information gleaned from more rigorous evidence. For instance, after one has done an empirical study of bicycle thefts in a certain jurisdiction, one might include a story of a specific bicycle theft in one’s report. This can have the effect of making more of an impression upon those who remember stories better than plots, tables, and distributions.

As long as the conclusions are drawn from the empirical evidence and not from the example anecdote, this practice is not in and of itself fallacious. However, there is a danger inherent in this practice. The example anecdote cannot describe what is happening in the entire population, and these illustrative examples can mislead the audience if the stories are remembered instead of the conclusions of the investigation. Therefore, they ought to be used with caution.

Conclusion

Anecdotal evidence is flawed. Anecdotes constitute samples that are unrepresentative, are often of insufficient size, and often lack needed comparisons. Because of this, anecdotes provide no insight about larger populations or about cause and effect.

Still, anecdotes are not without their uses. In particular, anecdotal evidence is relevant when determining the details of specific incidents. Anecdotal evidence can also be used to contradict universal assertions. However, anecdotes are subject to issues that affect memory, and these issues with memory are compounded when anecdotes are retold serially, so even in cases in which they are useful, anecdotes should be scrutinized and not trusted verbatim.

Anecdotes can also be used in a non-evidentiary ways, such as to precipitate further investigation or to provide illustrative examples, so long as no conclusions are drawn from the anecdotes themselves.

This article illustrates a fundamental issue with maxims. The maxim “anecdotes are not evidence” hints at some useful ideas, but when those ideas are elucidated, it can be seen that the state of affairs around anecdotes and their role in and around evidence is much more complicated than the four words of the maxim can explain. The maxim “anecdotes are not evidence” is an example of why critical thinking is best served by a thoughtful and thorough exploration of the issues surrounding a matter, rather than appeals to slogans.

Citations

“Article VI - Witnesses.” 2021. Federal Rules of Evidence. https://www.rulesofevidence.org/article-vi/.
“Article VIII - Hearsay.” 2021. Federal Rules of Evidence. https://www.rulesofevidence.org/article-viii/.
Dempsey, Amy. 2016. Summer of the Shark Was a Story Media Could Sink Their Teeth Into.” Thestar.com. https://www.thestar.com/news/insight/2016/08/28/summer-of-the-shark-was-a-story-media-could-sink-their-teeth-into.html.
“Estimated Influenza Illnesses, Medical Visits, Hospitalizations, and Deaths in the United States — 2019–2020 Influenza Season.” 2020. Centers for Disease Control and Prevention. https://www.cdc.gov/flu/about/burden/2019-2020.html.
“Flu Vaccination Coverage, United States, 2019–20 Influenza Season.” 2020. Centers for Disease Control and Prevention. https://www.cdc.gov/flu/fluvaxview/coverage-1920estimates.htm.
Hirst, William, and Gerald Echterhoff. 2012. “Remembering in Conversations: The Social Sharing and Reshaping of Memories.” Annual Review of Psychology 63 (1): 55–79. https://doi.org/10.1146/annurev-psych-120710-100340.
Loftus, Elizabeth F. 1975. “Leading Questions and the Eyewitness Report.” Cognitive Psychology 7 (4): 560–72. https://doi.org/10.1016/0010-0285(75)90023-7.
Loftus, Elizabeth F. 1993. “The Reality of Repressed Memories.” American Psychologist 48 (5): 518–37. https://doi.org/10.1037/0003-066X.48.5.518.
———. 2005. “Planting Misinformation in the Human Mind: A 30-Year Investigation of the Malleability of Memory.” Learning & Memory 12 (4): 361–66. https://doi.org/10.1101/lm.94705.
Loftus, Elizabeth F., and John C. Palmer. 1974. “Reconstruction of Automobile Destruction: An Example of the Interaction Between Language and Memory.” Journal of Verbal Learning and Verbal Behavior 13 (5): 585–89. https://doi.org/10.1016/S0022-5371(74)80011-3.
Loftus, Elizabeth F., and Jacqueline E. Pickrell. 1995. “The Formation of False Memories.” Psychiatric Annals 25 (12): 720–25. https://doi.org/10.3928/0048-5713-19951201-07.
QuickFacts: United States.” n.d. U.S. Census Bureau. Accessed January 14, 2021. https://www.census.gov/quickfacts/fact/table/US.
Roediger, Henry L., and Kathleen B. McDermott. 1995. “Creating False Memories: Remembering Words Not Presented in Lists.” Journal of Experimental Psychology: Learning, Memory, and Cognition 21 (4): 803.
Roediger, Henry L., Michelle L. Meade, David A. Gallo, and Kristina R. Olson. 2014. “Bartlett Revisited: Direct Comparison of Repeated Reproduction and Serial Reproduction Techniques.” Journal of Applied Research in Memory and Cognition 3 (4): 266–71. https://doi.org/10.1016/j.jarmac.2014.05.004.
Simons, Daniel J., and Christopher F. Chabris. 2011. “What People Believe about How Memory Works: A Representative Survey of the U.S. Population.” PLOS ONE 6 (8): e22757. https://doi.org/10.1371/journal.pone.0022757.
Summer Of The Shark In 2001 More Hype Than Fact, New Numbers Show.” 2002. University of Florida News. https://news.ufl.edu/archive/2002/02/summer-of-the-shark-in-2001-more-hype-than-fact-new-numbers-show.html.
“Vaccine Effectiveness: How Well Do the Flu Vaccines Work?” 2020. Centers for Disease Control and Prevention. https://www.cdc.gov/flu/vaccines-work/vaccineeffect.htm.

Footnotes


  1. This is based on some back-of-the-envelope math. The U. S. Census Bureau estimates the population of the United States to be approximately 328,000,000 people. (QuickFacts: United States n.d.) The proportion of people in the United States who received the flu vaccine during the 2019-2020 season is estimated to be 63.8% for all age groups. (“Flu Vaccination Coverage, United States, 2019–20 Influenza Season 2020) Therefore, there were approximately 209,300,000 vaccinated individuals and 118,700,000 unvaccinated individuals in the United States during the 2019-2020 flu season.

    The Centers for Disease Control and Prevention estimates there were 38,000,000 flu illnesses in the United States during the 2019-2020 flu season. (“Estimated Influenza Illnesses, Medical Visits, Hospitalizations, and Deaths in the United States — 2019–2020 Influenza Season 2020)

    A rough estimate of the relative risk for flu vaccination of 50% is reasonable. (“Vaccine Effectiveness: How Well Do the Flu Vaccines Work?” 2020)

    This results in estimated probabilities of 17.0% of catching the flu among the unvaccinated and 8.5% of catching the flu among the vaccinated.

    The probability a randomly selected individual both was vaccinated and caught the flu is therefore 63.8% × 8.5% ≈ 5.42%, and the probability a random sample of 10 individuals does not contain someone who both was vaccinated and caught the flu is (1 − 5.42%)10 ≈ 57.3%.↩︎

  2. Using the back-of-the-envelope calculations above, there would be around 54 such individuals in a random sample of 1,000.↩︎

  3. This is a plausible example of the phenomenon that anecdotal observations often do not comprise a sufficient sample size for inference, one which does not require knowledge of statistical methods. For the more mathematically minded, this phenomenon can be explored further by way of power computations at various sample sizes, which can be found in relevant statistics texts.↩︎

  4. Using statistical jargon, the fundamental issue here is that samples formed in this way to do not result in independent observations.↩︎

  5. Actually, 16 DRM lists were presented, but 2 were dropped from the study.↩︎

  6. Serial reproduction is just one aspect of the social phenomena of memory. By using a computer program, the experiment by Roediger et al. (2014) abstracted away many of the social influences on memory. These social influences have complex effects, with some of them aiding in memory recall and some of them causing more errors in memory. (Hirst and Echterhoff 2012)↩︎

  7. In the English speaking world, these include such things as having the witness swear an oath to testify truthfully, making the witness available for cross-examination, and requiring the physical presence of the witness so that the witness’ demeanor can be judged.↩︎

  8. If one were testifying about one’s statement made outside of court, this would technically be hearsay. In this case, however, one could just make the statement again as part of one’s testimony.↩︎