No, you can’t read a debate scoresheet

Chuan-Zheng Lee
8 min readJul 8, 2018

--

I posted this quiz in a few debating groups a couple of months ago. Although the quiz was light-hearted, it was sparked by two errors in the tab for USUDC 2018, the national title tournament for British Parliamentary debating in the United States. Indeed, the images are all from scoresheets from that USUDC. Many thanks to the 2,121 people who took the quiz, and to the ten chair adjudicators who accidentally lent their handwriting to the cause.

On the off-chance that you were here for the proposals rather than the numbers, the last section (“What can we do about this?”) discusses a few ideas that some fellow tabbers and I have for alleviating problems with ambiguous handwriting.

Overall score

The average score was 4.24 out of 10; the median was 4. Here’s a histogram:

No-one scored 9 out of 10. Congratulations to the seven people who scored zero. That small blip at 10 is two respondents; I’m not sure whether they actually hit 10, or were people retaking the quiz.

(I excluded the nine entries with fewer than two answers.)

Breakdown by question

The “correct answer” is the number intended by the adjudicator. In some cases I could use ballot math to infer this; in the others, I asked the chair responsible. Rows highlighted in green are questions that a majority of respondents answered correctly.

You might notice how the same numbers seem to come up a lot.

Contentious questions

Some digits proved particularly divisive, either in evenness of split, or in the number of different responses.

You can probably see why these are the most ambiguous, but these aggregate statistics obscure some of the story. Take image (c), for example, with continental Europeans separated out:

If you’re familiar with how continental Europeans write their 1s (or if you’re continental European, then how the rest of the world writes theirs) then it’s not too hard to see how this might’ve come about. The author of this scoresheet is American; 85% of her compatriots guessed her intentions right.

Similarly, east Asians (mostly mainland Chinese and Hongkongers) struggled with question (h):

Those are the starkest regional divides, but there are plenty more. Even in that ambiguous 4-or-9 in (a), the author’s home region was split almost evenly, 47%–53% between 4 and 9, whereas across the Pacific, 74% of their Oceanian counterparts thought (incorrectly) he meant a 9. Image (d) proved misleading all around (30%), but especially so for Germans (5%, n = 66) and Dutch (9%, n = 43).

Why am I telling you this? It’s to tell you that, while this is undoubtedly about your handwriting being illegible, there’s more to it than just bad handwriting. Numbers that might seem clear to you can be ambiguous to others. At large internationals (say, WUDC), having to use the home country of the chair to decipher whether it’s a 1 or a 7 is a recipe for delays if not errors. For this reason, chairs, I beg you to redouble your efforts to make sure your numbers can only possibly mean one thing.

Results by country and region

There’s not much instructional value in this, but I thought the chart was fun. When interpreting this data, remember that it’s based on scoresheets submitted at a United States national championship. More precisely, six of the images are courtesy of Americans, three of Canadians and one of a Mexican — if you expect at all people to be slightly better at deciphering the handwriting of their compatriots. (The data doesn’t really bear this hypothesis out.)

The number of respondents from each country n is on the right; error bars are 95% confidence intervals. Only countries with at least 15 respondents are shown. I wouldn’t read anything into small differences, particularly those within the error bars of each other.

Average score by country (error bars are 95% CIs)

Here’s the same chart, grouped by region, so as not to leave out countries with n < 15. You can find the regional groupings I used in this Google Spreadsheet.

Average score by region (error bars are 95% CIs)

Results by debating and tabbing background

The short of it is that there’s nothing interesting here. No differences were statistically significant. Judges with a major break and retirees sort of did worse than everyone else, but not by enough for it to be convincingly anything other than noise.

Well… almost nothing interesting. A full 249 people, or 12% of respondents, said they’ve tabbed a BP tournament of at least 50 teams. This means the tabbing community’s much larger than I thought. We should catch up in Cape Town at some sort of forum. Watch this Facebook group for updates?

More detailed results

This post just presents some highlights. If you’d like to see more statistical detail, or even play with the raw data, here are links to my working spreadsheet and Jupyter notebook:

I’m not naming the ten adjudicators whose handwriting formed the basis for this exercise. They’re far from alone, so it would seem inaccurate and unfair to make them the faces of this campaign.

What can we do about this?

If poor handwriting is a perennial problem on the debate circuit that repeatedly causes errors and delays, what can we do?

It’s standard practice in tab rooms to have extensive systems for catching ambiguities or errors. But because its resolution requires us to chase down chairs for clarification, every such effort necessarily exacerbates delays. At USUDC 2018, we summoned about five chairs every round; after rounds 1 and 5, when one such chair had gone astray, this cost us 30 and 17 minutes, respectively.

Avoiding the trade-off between accuracy and timeliness requires us to reduce ambiguity at its source. You might wonder why we insist that chairs do totals and ranks themselves, when we could infer both from speaker scores. It’s partly to prevent accidents, but it also serves as a source of confidence in cases that are almost readable.

I and a few tabbing colleagues have been thinking about strategies to help. Here are a few ideas we have in mind.

Electronic ballots

Counterintuitively, we don’t believe electronic ballots will help. Obviously, they’ll eliminate handwriting problems, because they don’t have handwriting. But we have good reason to believe it’s likely to just move mistakes from paper, where tab teams can detect them, to online forms, where we can’t.

At both Dutch WUDC 2017 and WUDC México 2018, chairs were asked to fill out both paper and electronic ballots. The 2017 tab team found that, where there was an inconsistency between them to be resolved by summoning the chair, the paper ballots prevailed about half the time. The 2018 tab team found that paper ballots were the correct version almost all the time.

There might be other reasons to switch to electronic ballots, of course, but the avoidance of errors isn’t one.

Complete double-blind entry

Handwriting ambiguities should never go unnoticed. Tab assistants are instructed to reject ambiguous scoresheets, and each one is double-checked by a different person. However, as I learnt at USUDC 2018, an enter-and-check system still seems too susceptible to human error.

We intend to add complete double-blind entry to Tabbycat: two different people would type in the entire scoresheet, including totals and ranks. (Currently, the first person types in just speaker scores, then checks totals and ranks against the computer’s calculation, and the second person checks everything.) If there’s any inconsistency, the system would reject the ballot. The second person wouldn’t see what the first person had typed in, to avoid confirmation bias in interpreting ambiguous handwriting.

The catch is that this just converts errors to delays: because it doesn’t reduce ambiguity at the source, each error caught is a chair that needs to be summoned.

Ballot design

Noting that handwriting ambiguities are almost always in the last digit, another idea is to have chairs confirm the last digit of their score by circling a choice on the scoresheet, like this:

This might help clarify chairs’ intentions, particularly for common in-betweens like 4/9. On the other hand, it might just introduce more sources of ambiguity. We don’t really know which effect will play out yet, so we’ll be experimenting with this design at some upcoming tournaments.

Runners rejecting ballots

If we stop ambiguous ballots and insist on fixing them at the point of collection, we can greatly reduce the delay in tracking down the responsible chair. It therefore makes sense for runners to be the first line of defence; for some tabbers, this is already standard practice.

The flipside is that it can be intimidating for runners — who are already under orders to hurry chairs up after 15 minutes has elapsed — to lay down the law on powerful independent adjudicators, some of whom consider themselves to be immune from runners’ requests, and not always politely. (By the way, please don’t be one of them. The WUDC judging manual isn’t joking when it calls them representatives of the adjudication core.)

Example bank

At tournaments that members of the WUDC Cape Town 2019 tab team run between now and WUDC, we’ll set aside incomplete and illegible ballots to compile some “how not to fill out a ballot” slides. We’re considering running these slides in the briefing room at WUDC, to provide constant reminders to chairs about how ballots can be ambiguous in ways that they might not expect. Unlike with this quiz, we may name the chairs who provide these examples. Consider yourself warned.

--

--