Cory Doctorow: Teaching Computers Shows Us How Little We Understand About Ourselves

A quote variously attributed to Richard Feynman and Albert Einstein has it that ‘‘If you can’t explain it to a six year old, you don’t really understand it.’’ Most of us have encountered this in our lives: you think you really know something and understand it, and then you try to teach it and realize that you never understood it in the first place.

Computers are the children of the human race’s mind, and as they become intimately involved in new aspects of our lives, we keep stumbling into semantic minefields, where commonly understood terms turn out to have no single, well-agreed-upon meaning across all parts of society. These conflicts all have a quiet drama, because on the definition of these ‘‘commonly understood’’ terms turns questions of social control with profound implications for our human lives.

Take names. When Google rolled out its Facebook-a-like service Google Plus in 2011, it stirred up controversy by declaring that it would adopt Facebook’s ‘‘real name’’ policy, meaning that its users would be expected to use their real, legal names in their online interactions. Google offered a lot of explanations for this policy – mostly revolving around reducing cruel behavior and spamming – and opponents of the idea offered their own arguments in response. Some pointed out that they were widely known by a name other than the one on their legal documents; others wanted the ability to socialize without making their real identities visible to violent stalkers; refugees from oppressive regimes raised the spectre of retaliation against their in-country relatives if they participated in visible online debates under their real names.

But the most interesting argument about ‘‘real name’’ policies is the realization that there’s no easy single definition of ‘‘real name’’ that doesn’t leave an appreciable slice of people out in the cold. Take my family, for example: our surname was Doctorovitch (or Doctorowicz, or Doctorowitch, or some other transliteration of the Cyrillic characters my grandfather’s family would have used to write down their name in Belarus, if they’d been literate and had had a lot of government forms to fill in). My father was born in Azerbaijan after his parents deserted from the Red Army. They called him Genyek, which is a diminutive form of Evgeny, though it’s not clear whether they ever called him Evgeny. It hardly matters, because they mostly called him Gadalya, a Jewish name with a bit of Yiddish and Hebrew in its origin, until he came to Canada when he was six and was given the ultra-Canadian anglicized name ‘‘Gordon.’’ My dad’s father was born Avram, which was anglicized as ‘‘Abraham’’ (naturally enough), but his first employer called him ‘‘Bill,’’ because that was a more ‘‘Canadian’’ name. It stuck, and his Canadian citizenship papers read ‘‘Abraham William Doctorow,’’ though no one ever called him ‘‘William.’’

Depending on how you count, most of the people in my family have three or four ‘‘real’’ names, and many of those names are on official government documents. Indeed, it’s often the case that the ‘‘real’’ names by which they are known to everyone are the only names that aren’t on their paperwork. They’re hardly alone – the world’s populations have only grown more mobile in the 60-some years since my father was born, and as they move from one place to another, their names are transliterated into strange alphabets and conformed to naming systems that reverse the patronymic and the given names. (Readers of mine may notice some connection between this business of multiple names and my novel Someone Comes to Town, Someone Leaves Town.)

Some programmers have known for a long time that explaining ‘‘real names’’ to a computer is hard. In 2010, Patrick McKenzie posted his now-classic essay, ‘‘Falsehoods Programmers Believe About Names’’, which includes such gems as, ‘‘I can safely assume that this dictionary of bad words contains no people’s names in it,’’ ‘‘People’s names are assigned at birth,’’ and ‘‘People have names.’’ Despite the heroic efforts of programmers like Patrick McKenzie, ‘‘true names’’ policies crop up all the time, and the input validators on fill-in-the-blanks forms often encode incorrect assumptions about names.

We’ve had bureaucracies and forms for a long time, of course, but human-powered bureaucracies and computerized ones differ in important ways. A bureaucrat can always choose to write a very long name in very, very small letters in order to fit it on an important form, or draw an arrow in the margin and continue it on the other side. But when a programmer instructs a computer to reject, or disregard, all input longer than 64 characters, she effectively makes it impossible for a bureaucrat – however sympathetic – to accommodate a name that’s longer than she’s imagined names might be. With a human bureaucrat, there was always the possibility of wheedling an exception; machines don’t wheedle.

You can see a microcosm for this in online role-playing games versus the older tabletop games: in a tabletop game, players could think up unlikely courses of action in the face of certain death, and good game-masters could give them a minute chance of succeeding (roll two natural 20s in a row and your trap will kill the ogre), creating the possibility of epic heroic moments. No such player/referee collaboration is possible when the referee is a machine (see Nate Combs’s 2004 Terra Nova essay ‘‘Where are the Heroes?’’ for more).

If names are hard, families are harder. Systems like Netflix and Amazon Kindle try to encode formal definitions of ‘‘family’’ based on assumptions about where you live – someone is in your immediate family if you share a roof – how you’re genetically related – someone is immediate family if you have a close blood-tie – how you’re legally related – someone is in your family if the government recognizes your relationship – or how many of you there are – families have no more than X people in them. All of these limitations are materially incorrect in innumerable situations.

What’s worse, by encoding errors about the true shape of family in software, companies and their programmers often further victimize the already-victimized – for example, by not recognizing the familial relationship between people who have been separated by war, or people whose marriage is discriminated against by the state on the basis of religion or sexual orientation, or people whose families have been torn apart by violence.

The ambiguity that is inherent in our human lives continues to rub up against our computerized need for rigid categories in ways small and large. Facebook wants to collapse our relationships between one another according to categories that conform more closely to its corporate strategy than reality – there’s no way to define your relationship with your boss as ‘‘Not a friend, but I have to pretend he is.’’

Ironically, computerized information systems have given us some really wonderful opportunities to embrace ambiguity. As David Weinberger points out in his seminal 2007 book Everything is Miscellaneous, computerized cataloging systems have freed books from the tyranny of physical shelves, allowing us to ‘‘shelve’’ them in as many places as they need to go – but the ambiguity that we’ve brought to our storehouses of human knowledge is woefully absent from our systems for managing human beings. We’re teaching computers about these relationships at speed, and it’s become clear we don’t understand them at all ourselves.


Cory Doctorow is the author of Walkaway, Little Brother, and Information Doesn’t Want to Be Free (among many others); he is the co-owner of Boing Boing, a special consultant to the Electronic Frontier Foundation, a visiting professor of Computer Science at the Open University and an MIT Media Lab Research Affiliate.


From the July 2013 issue of Locus Magazine

One thought on “Cory Doctorow: Teaching Computers Shows Us How Little We Understand About Ourselves

  • July 6, 2013 at 3:15 pm
    Permalink

    Interesting. I have rubbed up against some of these issues in my own software, which creates seating plans for events. There are also even more issues when your software is used internationally – different countries have different sort order, some parts of the name are ignored for sorting, a capital letter at the start of ‘von’, ‘van’ or ‘da’ can mean you have aristocratic orgins (or peasant origins, depending on the country). It is a bit of a cultural minefield. It gets even worse when trying to decide who sits next to who. That is something I leave to the human!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *