Sunday, July 6, 2014

Welcome to the World of Technochicracy! Revisiting the FB Debacle


"I have a joly wo, a lusty sorwe" - Chaucer, Troilus and Criseyde

A recent research paper published in PNAS and written by Adam D. I. Kramer - a member of Facebook's Core Data Team - and two information scientists from Cornell, made the news and caused some outcry last week. The authors admitted to having manipulated the news feed of some 700,000 thousand Facebook users back in January 2012; two experiments were conducted in parallel: in one, posts with "positive emotional content" had a lower chance of cropping up in a user's feed, and in the other experiment, many posts with "negative emotional content" were omitted from the feed. 

The theory goes something like this: it is well attested that emotional contagion occurs in face-to-face situations. Without knowing it, we scan other people's faces for clues on their emotional status. Once we've gauged that, mirror neurons in our brain fire away, and we adapt our own facial expressions accordingly. A sort of unconscious mimicry. Negativity breeds negativity, and happiness makes the world go round. Previous studies have focused on nonverbal cues, but what about verbal ones? What about situations where the people are miles apart? If a friend posts a negative status update on Facebook, will I catch the negativity bug and post a negative one myself? Or rather, given a large enough sample ("N=689,003" in this case) is there a statistically significant correlation between the emotions of group members exposed to positive and negative posts?

And the answer, sez Kramer et al., is yes! The connection is tenuous, like "gold to airy thinness beat" (incidentally not a phrase used in the paper), but it is there; about one in a thousand posts exhibited "emotional contagion." Given the scale of Facebook, however, "this would have corresponded to hundreds of thousands of emotion expression in status updates per day."

Relying on a vague "research" clause in the Facebook User Policy, the authors conducted an exercise in manipulation with hundreds of thousands of users. This makes a mockery of the idea of informed consent - a mockery more egregious than the false pretenses used by Stanley Milgram in his harrowing 1961 experiment on authority and obedience. While Milgram's subjects were in the dark about the real purpose of the experiments, at least they knew that they took part in one. We are veering dangerously close to mind control land here. In fact, in one of the first books written on the psychology of brain washing, William Sargent explains how:
"Various beliefs can be implanted in many people after brain function has been sufficiently disturbed by accidentally or deliberately induced fear, anger, or excitement. Of the results caused by such disturbances, the most common one is temporarily impaired judgement and heightened suggestibility. Its various group manifestations are sometimes classed under the heading of 'herd instinct'" (William Sargent, Battle for the Mind: A Physiology of Conversion and Brain-washing, 1957)
Now, Dr. Sargent was a behaviorist (white coat, stopwatch in hand), and the point is not that his analysis is especially lucid (it isn't). No, what is eerie here is the similarity between his Pavlovian notions of human behavior and the underpinnings of the Facebook study: deliberately induced feelings? Heightened sensibility? Herd instinct? Do we hear a bell ringing in the distance?

Here is what the study authors have to say about their methodology:
"Posts were determined to be positive or negative if they contained at least one positive or negative word, as defined by Linguistic Inquiry and Word Count Software (LIWC2007) (9) word counting system, which correlates with self-report and physiological measures of well-being, and has been used in prior research on emotional expression. LIWC (7, 8, 10). LIWC was adapted to run on [...] the News Feed Filtering system, such that no text was seen by the researchers. As such, it was consistent with Facebook's Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research."
In the wake of this debacle, most people have focused on the second part - the dilution and slippery slopification of "informed consent"; there are excellent pieces on the responsibility of the social scientist (and less-than-stellar apologies), but what really rubs me the wrong way has more to do with the first part, and what I have previously called technochicracy. It is the lingering suspicion that the stunning "big data" vista (complete with the cloud services floating overhead) is a set-piece propped up - like a Potemkin village in the midst of Silicon Valley - in front of a 1950s or 1960s landscape of dumb terminals and behaviorist labs.

This experiment was framed as a groundbreaking study in emotional contagion; thanks to big data crunching and state-of-the-art software, an effect only previously observed in face-to-face interaction in an artificial milieu could now be studied on a massive scale with humans, so to speak, in their natural habitat. And yet, the underlying methodology is so hopelessly crude as to bring to mind Pavlov's experiments on conditioned reflexes in dogs. In the words of his American disciple B. F. Skinner, "Pavlov's attention was directed mostly to the glandular part of this total response, because it could be measured by measuring the flow of saliva." A stimulus (a bell ringing before food is served, an upbeat status message) triggers a physiological response that can be adequately measured in a beaker or by said Linguistic Inquiry and Word Count Software. Conceptually speaking, there is little difference between measuring the flow of saliva and counting positive/negative words in the flow of big data.

In fact, the idea of trigger words in status messages correlating with emotional well-being, at least on the aggregate level, owes more to 19th century positivism than any "cutting edge" science. It makes a mockery of the human condition. Who better to knock down 19th century underpinnings than a 19th century poet? When accused of mixing gravity and levity in Don Juan, Lord Byron sent the following letter to his publisher to answer the critic:
"His metaphor is, that 'we are never scorched and drenched at the same time'. Blessings on his experience! Ask him these questions about 'scorching and drenching'. Did he never play at Cricket, or walk a mile in hot weather? Did he never spill a dish of tea over his testicles in handing the cup to his charmer, to the great shame of his nankeen breeches? [...] Did he never tumble into a river or lake, fishing, and sit in his wet cloathes in the boat, or on the bank, afterwards 'scorched and drenched', like a true sportsman? 'Oh for breath to utter!' - but make him my compliments; he is a clever fellow for all that - a very clever fellow." (Byron’s Letters and Journals VI:207)
If we think of Don Juan as a status update, the reason it is so difficult to parse is (and Byron makes this abundantly clear) that human emotion cannot be reduced to a discrete number on a happiness scale. Even in the extreme, seemingly most clear-cut cases, our sadness or happiness is seldom unalloyed:

"Every cloud has a silver lining..." 

"Ay, in the very temple of Delight / Veil'd Melancholy has her sovran shrine." 

Etc.

But what about the statistically significant (albeit minuscule) effect Kramer et al. observed? There are, in fact, many ways of accounting for it that have nothing to do with emotion whatsoever. I might, for example, tag a friend and quote what s/he says in my status update (the system is far too crude to take quotation into account.) There is also the possibility that a few stray words I have recently read linger in my mind, winding their way into my next status message.

On a more general level, Chomsky's critique of Skinner's language behaviorism bears repeating. One of his points is that the causality between stimulus and response can only be "studied" by multiplying categories or properties of the stimulus-object until it loses all and any pretense of objectivity:
"Consider first Skinner's use of the notions stimulus and response. In Behavior of Organisms he commits himself to the narrow definitions for these terms. [...] Evidently, stimuli and responses, so defined, have not been shown to figure very widely in ordinary human behavior. We can, in the face of presently available evidence, continue to maintain the lawfulness of the relation between stimulus and response only by depriving them of their objective character. A typical example of stimulus control for Skinner would be the response to a piece of music with the utterance Mozart or to a painting with the response Dutch. These responses are asserted to be "under the control of extremely subtle properties" of the physical object or event. Suppose instead of saying Dutch we had said  Clashes with the wallpaper, I thought you liked abstract work, Never saw it before, Tilted, Hanging too low, Beautiful, Hideous, Remember our camping trip last summer? or whatever else might come into our minds when looking at a picture (in Skinnerian translation, whatever other responses exist in sufficient strength). Skinner could only say that each of these responses is under the control of some other stimulus property of the physical object. If we look at a red chair and say red, the response is under the control of the stimulus redness; if we say chair, it is under the control of the collection of properties (for Skinner, the object) chairness, and similarly for any other response. This device is as simple as it is empty." (Noam Chomsky, "A Review of B. F. Skinner's Verbal Behavior", 1967)
An even better example (pace Chomsky) of the "suppose instead" refutation would be Proust's musings on the "petite phrase" he once heard in a sonata:
"When, after that first evening at the Verdurins’, he had had the little phrase played over to him again, and had had sought to disentangle from his confused impressions how it was that, like a perfume or a caress, it swept over and enveloped him, he had observed that it was to the closeness of the intervals between the five notes which composed it and to the constant repetition of two of them that was due that impression of a frigid and withdrawn sweetness; but in reality he know that he was basing this conclusion not upon the phrase itself, but merely upon certain equivalents, substituted (for his mind’s convenience) for the mysterious entity of which he had become aware"
In much the same way as the sensuous effects and enveloping power of the musical phrase are not part and parcel of  its "stimulus properties," every status update containing the word "happy" (or a synonym) is not a property intrinsic to the word "happy." Proust's tentative explanation involving "certain equivalences" is a marvelous debunking of the causality assumption. For all its duh!-simplicity, the "lingering" hypothesis (mine and Proust's) is far too complex to be accounted for by Skinnerian behaviorism (whether of the 1950s stamp or dressed-up in technobabble.) Emotional contagion does exist, but it does so in a complex interplay of facial expression, mimicry and thought - both conscious and unconscious. It cannot be reduced to a stimulus-response scheme.

In fact, when we peak through the cool Matrix-curtain of big data, we find a faded Polaroid from the past - Brylcreemed men in white coats studying lab rats or dogs cowering in cages - men whose theories and methodologies were already debunked by the time horn-rimmed glasses went out of fashion... Welcome to the world of Technochicracy!

Tuesday, May 6, 2014

Multiple Choice and the Critic



Recently I wrote a rather heated polemic against standardized testing, but as Mssrs. Emerson and Wilde so aptly put it: consistency is "the hobgoblin of little minds" and "the last refuge of the unimaginative." So today I will do a graceful volte-face and acclaim its virtues (well, in one specific context) while sporting a sheepish little smile. 

Some weeks ago I took the GRE Literature Test, required by many graduate programs in English. I am already a PhD student, so there was no pressing need for me to do so. But what with English not being my first language, I wanted to see how I would stack up against the motley pool of test takers (college seniors majoring in English, students fresh out of MA programs in English and "mature" students who graduated years or decades ago and want to attend grad school.) Plus, a good result might be a feather in my beret should I ever feel like applying to adjunct positions in English.

The test consists of around 230 multiple choice questions on the analysis and identification of texts ranging from Beowulf to Elizabeth Bishop, Gower to Ginsberg. "Multiple choice? Literature?" I hear you say. Wouldn't that be a throwback to a prelapsarian time before meanings had started to multiply by mitosis? A time when Oxford was still a city of aquatint and dons in caps and gowns were the guardians of truth? A time, in short, when an exam on Edmund Spenser could look like this:
"In whose reign did he flourish? Repeat Thomson's lines. What is said of his parentage? What does Gibbon say? How did he enter Cambridge? What is a 'sizer,' and why so called? What work did he first publish? What does Campbell say of Raleigh's visit to Spenser?" (A Compendium of English Literature, Charles D. Cleveland, 1852)
Or perhaps (given that you need to be able to recognize a iambic tetrameter or ottava rima) we think of the diluted versions of New Criticism that seeped down to high school students in the 50s, brilliantly captured by the fictitious Understanding Poetry by the equally fictitious J. Evans Pritchard, PhD:
"To fully understand poetry, we must first be fluent with its meter, rhyme, and figures of speech. Then ask two questions: One, how artfully has the objective of the poem been rendered, and two, how important is that objective. Question one rates the poem's perfection, question two rates its importance. And once these questions have been answered, determining a poem's greatness becomes a relatively simple matter."
Ah, 'twas the best of times... A time when the canonical texts were treated with silk gloves and the pages of the textbooks were still firmly glued together. But today, when every single literary edifice has been subjected to the wrecking ball of Derrida and Sons, Deconstruction Company? Sure, you could test the trivial (as in the Victorian exam) but you're only a click away from those facts on your cell phone, so why bother with them in the first place? And while you might be able to scratch the surface with multiple choice, surely you'll never reach the murky, rhizomatic depths of literature?

Before I answer my own rhetorical questions, I need to throw in a caveat. The GRE Literature test is rather silly (but surprisingly fun to take); its "predictive value" is questionable. You can score very high and still be a mediocre critic. But then again, no English Department in the US will ever judge an application solely on the GRE score. In fact, GPAs, writing samples and published articles are infinitely more important. And this is how it should be. I do, however, think that the test says something. It's an indisputable fact that, say, Althusser's notion of interpellation is a form of "coercive address," that Thomas of Hales' "Hwer is Paris and Heleyne" exemplifies the "ubi sunt motif" and that the choice between my and mine in Shakespeare's Sonnet 23 rests on "the same rationale as the Modern English choice between a and an."

Factual recall, rudimentary close reading skills and linguistic inference, all pretty elementary skills, right? What about those pesky rhizomes? Well, consider this item (taken from the Practice Test Booklet):
So what with blod and what with teres
Out of hire yhe and of hir mouth.
He made hire fairce face uncouth;
Sche lay swounende unto the deth,
Ther was unethes eny nreth:
Bot yit when he hire tunge refte,
A litel part therof belefte,
Bot sche with al no word mai soune,
But chitre and as a brid jargoune.

Which of the following lines make use of the same story?

(a)
Twit twit twit
Jug jug jug jug jug jug
So rudely forc'd
Tereu

(b)
Tu – whit! – Tu – Whoo!
And hark, again! The crowing cock,
How drowsily it crew.

(c)
I do not know which to prefer,
The beauty of inflections
Or the beauty of innuendoes

(d)
A sudden blow: the great wings beating still
Above the staggering girl...

(e)
This I sat engaged in guessing, but no
Syllable expressing
To the fowl whose fiery eyes now burned
Into my bosom's core
In order to make the connection between John Gower's and T. S. Eliot's uses of the Philomela myth, you need to be conversant with English literature from vastly different periods. Factual recall might help you identify the different snippets, but it will only take you so far. In fact, only 19% of all students tested got this one right (i.e. fewer than you would expect if they just threw a random guess.) This is hardly surprising; students take classes in Modernist Poetry and (though not very likely) in Middle English Poetry, but they are not taught to make thematic connections. And who can blame them when even tenured professors are bewitched by the siren-song of Foucault with its call for "absolute discontinuity" between the modern and the pre-modern (someone who did have a good supply of organic beeswax was J. G. Merquior whose takedown ought to be required reading for professors and undergraduates alike.)

In fact, this item calls for a modicum of that quaint and curiously old-fashioned thing called erudition. And this is part of what I find appealing about the test. Everyone in grad school is supposed to have the tools necessary to delve deep into their chosen area, but in order to retrace the winding paths that lead from text to text, you also need a broad survey map. When novelists and poets prove to be more well read than the academics dealing with them, something is clearly awry. Case in point: postmodern scholars (of some renown, I might add), writing on Chinua Achebe or Yvonne Vera, who are blind to their grapples with and responses to T. S. Eliot, simply because they have never read him. As that much-maligned poet himself put it: "the most individual parts of [a poet's] work may be those in which the dead poets, his ancestors, assert their immortality most vigorously." (from "Tradition and the Individual Talent")

To recap, I would say that the wide-ranging reading required to answer questions spanning over a period of 1000 years is a good foundation for more focused work in grad school. Even when dealing with something highly specific you still need to make connections and discern influences straddling the epochal divides.

You are also tested on the King James Bible and some Greco-Roman literature and mythology, and let's face it, every Western author writing before, say, 1950, writes for a reader well versed in the Bible. How can we feel for Leopold Bloom, and understand what an underdog and outsider he is, if we are as ignorant of the meaning of "I.N.R.I" as he is? Joyce takes for granted that we can appreciate both the ignorance and the sheer beauty of his folk etymological strike of genius: "Iron Nails Rushed In."

And to take another example: for all his expertise on military and fortification history, Sterne's Uncle Toby has no idea who Cicero was. If we share his ignorance, he is no longer the exceptional oddball the contemporary audience loved. Such knowledge breeds familiarity; it might help bridge the divide between writings from the past and the contemporary scholar. He or she might actually get it – not on an academic "[T]he 18th century readers, most of whom had studied Cicero in the original Latin..." level, but on a more visceral "Whoa, this guy is amazing!" one.  Or in the words of Cleanth Brooks:
"We tend to say that every poem is an expression of its age; that we must be careful to ask of it only what its own age asked; that we must judge it only by the canons of its age. Any attempt to view it sub specie aeternitatis, we feel, must result in illusion.

Perhaps it must. Yet, if poetry exists as poetry in any meaningful sense, the attempt must be made. Otherwise the poetry of the past becomes significant merely as cultural anthropology, and the poetry of the present, merely as a political, or religious, or moral instrument […] We live in an age in which miracles of all kinds are suspect, including the kind of miracle of which the poet speaks. The positivists have tended to explain the miracle away in a general process of reduction which hardly stops short of reducing the "poem" to the ink itself. But the "miracle of communication," as a student of language terms it in a recent book, remains. We had better not ignore it, or try to "reduce" it to a level that distorts it. We had better begin with it, by making the closest possible examination of what the poem says as a poem." (The Well Wrought Urn)
While a knowledge of history, myth and the Bible, of metres and tropes, and of allusions, echoes and thematic connections (things that can be tested on a multiple choice exam) will not get you there, it might take you some way towards experiencing the miracle of language and literature.



Tuesday, April 15, 2014

Amateur vs. Pro: the Bout of the 19th Century



"Many words that are now unused will be rekindled,
Many fail now well-regarded (If usage wills it so,
To whom the laws, rules, and control of language belongs.)" – Horace, Ars Poetica

Tracing how the "amateur" of the late 18th century – whether armchair artist or gentleman scholar – turned into a laughing-stock some hundred years later is to sketch the fall of the moneyed and leisured class; it is also to see the rise of the "middling" classes, whose members reconstituted themselves as professionals. The prerogatives of birth meant far less as more people could, at least in theory, gain some upward mobility. But this development came at a price; the widening specialization and division of labor, and the subdivision of life into a public and private sphere, were hotbeds for alienation and anomie. With professionalism came educational pragmatism. The seven Latin roads of a liberal education – the trivium and quadrivium – were repaved by professional workmen into a Second Empire boulevard. You trained for your chosen career and did not stray from its path. In today's job market, the metaphor of the one road is ubiquitous. If you aspire to a life in the fast lane, simply follow the road to success. 7 steps is all it takes (job hoppers and amateurs amblers need not apply)!

Throughout its 230 year history, the word "amateur" has both been inflected by, and vocally opposed to, these changes in society, and throughout much of this time it has held diverging meanings and connotations. It is as if the Zeitgeist itself dabbled as a lexicographer for competing dictionaries. This tension, as I will go on to show, is brilliantly captured in several Victorian works of art, from canonical literature to potboilers and pulp fiction, but let us start with its humble beginnings. Let us turn the clock back to a time when it was newfangled enough to call out for a definition. In his 1803 Cyclopedia, the Rev. Abraham Reese (himself an amateur encyclopedist) provides the following gloss:
"In the arts [...] a foreign term introduced and now passing current amongst us, to denote a person understanding, and loving or practising the polite arts of painting, sculpture, or architecture, without any regard to pecuniary advantage. [...] Amateurs who practise were never perhaps in greater number or of superior excellence, and those who delight in and encourage the arts have been the means of raising them in this country to that eminence to which they are arrived. It is to be regretted, however, that the great works of former ages, collected by amateurs in this kingdom, are not as accessible to our professors as they are in foreign countries"
Derived from the Latin verb for 'to love' ('amare') it is against the backdrop of burgeoning professionalism that we must view the coinage. The disregard of "pecuniary advantage," pits the dabbler against the professional draftsman or artist, but, in Reese's view, the relationship can be a mutually beneficial one rather than the cause of animosity and class resentment. As a member of the gentry or aristocracy, the amateur collector can afford to do all the legwork while traveling the length and breadth of Europe and racking up artworks, which can then be exhibited and used for instruction by the professors. Today we would perhaps talk about the synergistic effects of non-profit crowdsourcing.

Challenging the Amateur
Unfortunately, Reese's call for amateur-professional collaboration was unheeded. The first decades of the century did see an explosion of books addressed to amateurs in the arts, but these were mostly written by professors or professionals who wanted to impart some (limited) knowledge to the armchair art-lover. Sometimes lip service was paid to his or her judgment; in his 1814 pamphlet Short Address to the Amateurs of Ancient Painting, for example, professional artist H. C. Andrews challenged "the world to produce a painting of equal merit" to da Vinci's St. John the Baptist. But while Reese might have believed this to be possible, maybe even likely (NEWFLASH: Amateur Art Sleuth Discovers Lost Renaissance Masterpiece Gathering Dust in Venetian Palazzo) Andrews' challenge comes with the supercilious smile of someone who does not expect to be challenged.

Someone who did feel that a gauntlet had been thrown down was British architect Joseph Gwilt. Reading the morning paper one day he came across an unsigned review arguing that British architects and professors "afford proof how imperfectly every style of architecture appears to be understood." Enraged by this slight to his profession he decided to set the record straight:
"may it not be asked, whether this sentence passed upon a whole profession by an amateur, who from his writing is but slenderly versed in the art, is not written with an acerbity which shows some latent feeling arising from the want of homage to amateurs on the part of the professors. It would be refreshing to see one of the designs of any of the amateurs and critics, who, like the reviewer, pronounce judgment on a body of men whose lives are passed in the study of the art." (Elements of Architectural Criticism for the use of Students, Amateurs and Reviewers 24)
Unless they produce a blueprint or design worthy of the pros, the amateurs should help themselves to some humble pie (preferably by reading his book). With supreme irony, Gwilt's setting his hope up for a "refreshing" amateur design casts the non-professionals as musty and moldy – a class well past its sell-by date.

Heraldry and Whores
Judging from other how-to manuals from the time, amateurs were becoming (or perceived themselves as being) more and more marginalized. In 1828 Harriet Dallaway published A Manual of Heraldry for Amateurs – a "small essay intended chiefly for the use of my sex, or amateurs of heraldry, who may have a taste for such pursuits as connected with history and genealogy." The parallelism is striking. In the eyes of the world, the woman shouldn't abandon house and hearth for bookish learning, and, in much the same way, the amateur should throw his avocations to the winds and embark on a professional career. It wouldn't be reading too much into her caveat to say that the amateur was now, if not as frowned upon, at least somehow comparable to the woman with aspirations beyond her immediate "business."

Some decades after Reese's collaborative ideas, things had certainly changed. The ongoing professionalization spared no sector of society, least of all the underworld. In Henry Mayhem's newspaper reports on the seedy sides of Victorian London, he finds himself at a loss to account for the "amateur" prostitute:
"Those women who, for the sake of distinguishing them from the professionals [elsewhere termed "operatives"], I must call amateurs, are generally spoken of as 'Dollymops,' Now many servant-maids, nurse-maids who go with children into the Parks, shop girls and milliners who may be met with at the various 'Dancing Academies,' so called, are 'Dollymops.' We must separate these latter again from the 'Demoiselle de Comptoir,' who is just as much in point of fact a 'Dollymop,' because she prostitutes herself for her own pleasure, a few trifling presents or a little money now and then, and not altogether to maintain herself. But she will not go to the Casinos, or any similar places, to pick up men" (The London Underworld in the Victorian Period 43)
The incredulous "not altogether" (knitted brows, chin in hand) registers the confusion. Now, the point is not that the "Dollymops" were somehow pro bono ambassadors who enjoyed their business. In all likelihood they did not. But by this time it was getting increasingly difficult to wrap your head around the fact that some people had other, perhaps more complicated, motives than those dictated by their profession.

The March of Progress
This process of marginalization that relegated the amateur to Cabinet of Curiosity fodder (an armchair architect, a female heraldist, a whore who is not quite whore) did not arise out of a vacuum. Writing when the industrial revolution was still in its infancy, Adam Smith extolled the wealth-creating virtues of labour specialization:
"To take an example, therefore, from a very trifling manufacture, but one in which the division of labour has been very often taken notice of, the trade of a pin-maker: a workman not educated to this business (which the division of labour has rendered a distinct trade), nor acquainted with the use of the machinery employed in it [...] could scarce, perhaps, with his utmost industry, make one pin in a day, and certainly could not make twenty. But in the way in which this business is now carried on [...] it is divided into a number of branches, of which the greater part are likewise peculiar trades. One man draws out the wire; another straights it; a third cuts it; a fourth points it; a fifth grinds it at the top for receiving the head; to make the head requires two or three distinct operations; to put it on is a peculiar business; to whiten the pins is another; it is even a trade by itself to put them into the paper; and the important business of making a pin is, in this manner, divided into about eighteen distinct operations [...] ten persons, therefore, could make among them upwards of forty-eight thousand pins in a day."  (An Inquiry into the Nature and Causes of the Wealth of Nations)


For him, the amateur pin-maker was an anachronism, a throwback to a bygone era when a blacksmith or farrier furnished the product all by himself (and, heaven forbid, didn't even stick to pins!) By mid-century the branches of economic theory and social science had converged, and through these "scientific" bifocals the amateur became even more primitive. Herbert Spencer considered the jack-of-all-trades an atavism, an evolutionary cul de sac. Differentiation and professionalism was no longer "just" the order of the day (an ideological choice that made sense in terms of production) but the supreme law of civilization:
"The change from the homogeneous to the heterogeneous is displayed in the progress of civilization as a whole, as well as in the progress of every nation; and is still going on with increasing rapidity. As we see in existing barbarous tribes, society in its first and lowest form is a homogeneous aggregation of individuals having like powers and like functions: the only marked difference of function being that which accompanies difference of sex. Every man is warrior, hunter, fisherman, tool-maker, builder; every woman performs the same drudgeries. Very early, however, in the course of social evolution, there arises an incipient differentiation"  ("Progress: Its Law and Cause")

Major-Generals and Detectives
It is true that the amateur still had some lease of life. With the premiere of Gilbert & Sullivan's The Pirates of Penzance in 1879, he entered the stage as antihero. With his classical erudition and breadth of knowledge, the Major-General makes clear that he is the very model of the liberal scholar. Armed with an unquenchable thirst for knowledge (not to mention an impeccably twisted moustache) he had embarked on an intrepid journey through the trivium and quadrivium, quite oblivious to the fact that the only thing that really mattered as the 19th century drew to a close, was marching the one road: from the military academy to decorations and promotions via the battlefield. We root for him and feel for him – much like we do for Sir John Falstaff (in terms of "pluck" and military "experience" surely his great forebear) – precisely because we sense the tragedy looming over the comedy. We know that Hal's drinking buddy will one day prove a liability, and we fear that when the curtain falls, the Major-General will be trampled underfoot by the "march" of progress, turning amateurs, jacks-of-all-trades, dabblers, polymaths and generalists into roadkill.

It is telling that the most enduring character of 19th century fiction is not the moribund Major-General, but his antimatter avatar – someone who, to paraphrase an early review of the opera, "is uninformed on all subjects, except those connected with his profession." Take it away, Dr. Watson!
"Upon my quoting Thomas Carlyle, he inquired in the naivest way who he might be and what he had done. My surprise reached a climax, however, when I found incidentally that he was ignorant of the Copernican Theory and of the composition of the Solar System. That any civilized human being in this nineteenth century should not be aware that the earth travelled round the sun appeared to be to me such an extraordinary fact that I could hardly realize it.
"You appear to be astonished," he said, smiling at my expression of surprise. "Now that I do know it I shall do my best to forget it."
"To forget it!"
"You see," he explained, "I consider that a man's brain originally is like a little empty attic, and you have to stock it with such furniture as you choose. A fool takes in all the lumber of every sort that he comes across, so that the knowledge which might be useful to him gets crowded out, or at best is jumbled up with a lot of other things so that he has a difficulty in laying his hands upon it. Now the skilful workman is very careful indeed as to what he takes into his brain-attic. He will have nothing but the tools which may help him in doing his work, but of these he has a large assortment, and all in the most perfect order. It is a mistake to think that that little room has elastic walls and can distend to any extent. Depend upon it there comes a time when for every addition of knowledge you forget something that you knew before. It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones."
"But the Solar System!" I protested.
"What the deuce is it to me?" he interrupted impatiently; "you say that we go round the sun. If we went round the moon it would not make a pennyworth of difference to me or to my work." (A Study in Scarlet)
I have always wondered whether the culture shock Dr. Watson experienced here was not as much a cause for his future PTSD as that fateful Jezail bullet in Maiwand. Fred Flintstone could not have been more confused and agitated had he crashed into George Jetson's atomic aerocar. While Holmes' rationale is anchored in Victorian science – the phrenologist idea of a discretely ordered and finite brain – his desire to achieve a one-to-one correspondence between knowledge and the demands of his profession is a strikingly modern one. Even the metaphor he uses to describe this would later find its way into the 21st century: if all the Business Self-Help books are anything to go by, it is imperative that you develop the specific mental tools and tool kits required by your profession.

In fact, Holmes' methods of observation and deduction are so cutting-edge that it is the professionals who come across as bumbling amateurs:
"Gregson and Lestrade had watched the manoeuvres of their amateur companion with considerable curiosity and some contempt. They evidently failed to appreciate the fact, which I had begun to realize, that Sherlock Holmes' smallest actions were all directed towards some definite and practical end."
It is probably not a coincidence that when he makes his grand reappearance in "The Return of Sherlock Holmes" (both he and Professor Moriarty had previously fallen to their death grappling on top of the Reichenbach Falls, but the reading public would have none of that) he does so by pretending to be a Victorian eccentric before unmasking himself. So it turns out that Holmes was unscathed after all; it was the 19th century amateur who had gone to meet his maker. It's a ham-fisted allegory, but it certainly gets the point across:
"I struck against an elderly, deformed man, who had been behind me, and I knocked down several books which he was carrying. I remember that as I picked them up, I observed the title of one of them, THE ORIGIN OF TREE WORSHIP, and it struck me that the fellow must be some poor bibliophile, who, either as a trade or as a hobby, was a collector of obscure volumes. I endeavoured to apologize for the accident, but it was evident that these books which I had so unfortunately maltreated were very precious objects in the eyes of their owner. With a snarl of contempt he turned upon his heel, and I saw his curved back and white side-whiskers disappear among the throng. [...] I had not been in my study five minutes when the maid entered to say that a person desired to see me. To my astonishment it was none other than my strange old book collector, his sharp, wizened face peering out from a frame of white hair, and his precious volumes, a dozen of them at least, wedged under his right arm. [...]
"Well, sir, if it isn't too great a liberty, I am a neighbour of yours, for you'll find my little bookshop at the corner of Church Street, and very happy to see you, I am sure. Maybe you collect yourself, sir. Here's BRITISH BIRDS, and CATULLUS, and THE HOLY WAR—a bargain, every one of them. With five volumes you could just fill that gap on that second shelf. It looks untidy, does it not, sir?"
I moved my head to look at the cabinet behind me. When I turned again, Sherlock Holmes was standing smiling at me across my study table. I rose to my feet, stared at him for some seconds in utter amazement, and then it appears that I must have fainted for the first and the last time in my life. Certainly a gray mist swirled before my eyes, and when it cleared I found my collar-ends undone and the tingling after-taste of brandy upon my lips. Holmes was bending over my chair, his flask in his hand" ("The Return of Sherlock Holmes")
TBC...

Monday, April 7, 2014

The Computer Illiterati Conspiracy (or "Why the Average Teaching Assistant Makes Six Times as Much as College Presidents")



With a growing college population, and the implementation of the Common Core Standards for K-12 students, Automated Essay Scoring (AES for short) is slated to become one of the most lucrative fields in the education market within a few years. Teachers might be good enough when it comes to assessing their students' writing, but they are painstakingly slow (a computer algorithm can churn out grades for tens of thousands of essays in a matter of seconds); they are also inconsistent and biased, and – banish the thought! – they want to get paid for their services.

These are the arguments put forward by ed policy makers and supported by one-dimensional (not to say shoddy) research, such as a much-quoted 2012 study from the University of Akron in which the authors compared human readers scoring student essays "drawn from six states that annually administer high-stakes writing assessments" with the performance of nine essay algorithms grading the same essays. They concluded that:
"automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre [sic!] Because this study incorporated already existing data (and the limitations associated with them), it is highly likely that the estimate provided represent a floor for what automated essay scoring can do under operational conditions." (2–3)
Between the lines of academic jargon in the last sentence we find a startling claim: if the high correlation between human readers and their silicon counterparts only represents a "floor" of what the programs are capable of, then then implication must surely be that they are, for all intents and purposes, better graders than the teachers. And true enough, the authors go on to deplore their inconsistency and inability to follow simple instructions:
"The limitation of human scoring as a yardstick for automatic scoring is underscored by the human ratings used for some of the tasks in this study, which displayed strange statistical properties and in some cases were in conflict with documented adjudication procedure." (27)
This is nonsense; nonsense wrapped in academic abstraction, but nonsense nonetheless. When teachers stray from "documented adjudication procedure," this is precisely because they are experienced and creative readers who know full well that an essay might be great even though it does not conform to – and sometimes consciously flouts – rigid evaluation criteria. And as for their grading exhibiting (gah!) "strange statistical properties" it is important to realize that this is not a sign of human fallibility. Quite the contrary. If there is a huge discrepancy between two readers evaluating the same essay, this indicates that at least one of them (possibly both although the one recommending the conservative grade might be wary of repercussions if he or she does not follow the criteria to the letter) has discovered that it is an outstanding essay.

Computer algorithms will always penalize innovation, but surely the students are not supposed to pen Pulitzer-winning essays? Isn't the point of the essays rather to gauge whether they can craft coherent texts according to the K-12 Common Core Standards (the ones listed below are for informative/explanatory essays)?
"Introduce a topic clearly, provide a general observation and focus, and group related information logically; include formatting (e.g., headings), illustrations, and multimedia when useful to aiding comprehension.
Develop the topic with facts, definitions, concrete details, quotations, or other information and examples related to the topic.
Link ideas within and across categories of information using words, phrases, and clauses (e.g., in contrast, especially).
Use precise language and domain-specific vocabulary to inform about or explain the topic.
Provide a concluding statement or section related to the information or explanation presented."
Yes, but even though these criteria are highly mechanic and wouldn't necessarily (if you excuse my anthropomorphizing) recognize a good essay if it bit them in the face, the AES systems still fall woefully short. They can do a word count and a spell check; they can look for run-on sentences and sentence fragments, and discover the ratio of linking words and academic adverbs. The fourth bullet point shouldn't pose much of a problem either. Since they have been fed hundreds of texts graded by humans, and extrapolated the "domain specific" words which correlate with high grades. And what about factual accuracy and logical progression, surely a piece of cake for the silicon cookie monster? Not quite.

One of the most vocal critics of automated essay assessment, Les Peralman who is director of writing at M.I.T., has taken one of the most commonly used automatic scoring systems for a spin. The e-Rater is used not by K-12 schools but by the ETS to grade graduate-level GRE essays (i.e. one of the most high-stakes tests on the market.) So how does it measure up? No, let us not even consider creativity, subtlety, style and beauty (all important traits in grad school work), but look at the rudimentary skills outlined in the Common Core Standards. Is the e-Rater able to discriminate factual accuracy from outlandish claims, logical progression from a narrative mess, sense from nonsense? The following essay, written by Perelman, received the highest grade possible – 6/6 (an essay with this score "sustains insightful in-depth analysis of complex ideas"):
Question: "The rising cost of a college education is the fault of students who demand that colleges offer students luxuries unheard of by earlier generations of college students -- single dorm rooms, private bathrooms, gourmet meals, etc." Discuss the extent to which you agree or disagree with this opinion. Support your views with specific reasons and examples from your own experience, observations, or reading. 

In today's society, college is ambiguous. We need it to live, but we also need it to love. Moreover, without college most of the world's learning would be egregious. College, however, has myriad costs. One of the most important issues facing the world is how to reduce college costs. Some have argued that college costs are due to the luxuries students now expect. Others have argued that the costs are a result of athletics. In reality, high college costs are the result of excessive pay for teaching assistants. 

I live in a luxury dorm. In reality, it costs no more than rat infested rooms at a Motel Six. The best minds of my generation were destroyed by madness, starving hysterical naked, and publishing obscene odes on the windows of the skull. Luxury dorms pay for themselves because they generate thousand and thousands of dollars of revenue. In the Middle Ages, the University of Paris grew because it provided comfortable accommodations for each of its students, large rooms with servants and legs of mutton. Although they are expensive, these rooms are necessary to learning. The second reason for the five-paragraph theme is that it makes you focus on a single topic. Some people start writing on the usual topic, like TV commercials, and they wind up all over the place, talking about where TV came from or capitalism or health foods or whatever. But with only five paragraphs and one topic you're not tempted to get beyond your original idea, like commercials are a good source of information about products. You give your three examples, and zap! you're done. This is another way the five-paragraph theme keeps you from thinking too much. 

Teaching assistants are paid an excessive amount of money. The average teaching assistant makes six times as much money as college presidents. In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, a staring roles in motion pictures. Moreover, in the Dickens novel Great Expectation, Pip makes his fortune by being a teaching assistant. It doesn't matter what the subject is, since there are three parts to everything you can think of. If you can't think of more than two, you just have to think harder or come up with something that might fit. An example will often work, like the three causes of the Civil War or abortion or reasons why the ridiculous twenty-one-year-old limit for drinking alcohol should be abolished. A worse problem is when you wind up with more than three subtopics, since sometimes you want to talk about all of them.
Factual accuracy aside, where is the "in-depth analysis" and the logical progression? This hilarious rant has the trappings of an excellent essay – an advanced vocabulary, plenty of academic linking words as well as a good portion of "domain words" used in student essays on the same topic that scored highly ("teaching assistants", "accommodations", "capitalism") – and the machine cannot tell the difference. The algorithm can be easily fooled, something ETS made no secret of in a 2001 paper. But while admitting that utter nonsense can score highly, they also claim that this is of little relevance since students do not set out to trick an algorithm; they write with human beings in mind (there is still a human reader involved in the GRE scoring process), and the overlap between essays deemed good by humans and the algorithms is almost complete. We can illustrate this with a Venn diagram of essays receiving high scores:




It won't be long, however, before the human readers are given the boot. If you plug the high predictive validity, specious though it might be, into a cost-benefit analysis you would fool many a school board. And here's the rub, with no human reader involved, the green circle is a much more comfortable target to aim for than the blue bull's eye. Chances are that K-12 teachers, pressured to teach the Common Core tests rather than the skills these tests are supposed to measure, will be forced to coach their students how to produce impressive sounding gibberish, perhaps along the lines of:
"You see, start out with a phrase such as 'In today's society', 'During the Middle Ages', or, why not, 'In stark contrast to'. Then you rephrase the essay prompt and begin the second paragraph. Start with a linking word; "thus" or "firstly" are always a safe bet. And whatever you do, don't forget the advanced content words; if you're supposed to write about whether technology is good for mankind, how about a liberal sprinkling of "interaction", "alienation", "reliance" and "Luddite"... Oh yes, the last word will almost guarantee that you'll get an A! In the thirds paragraph..."
As loath as I am to beat the dystopian drum here, there is a real risk that the focus on discrete metrics (and consequently on uniformity and rote-learning) in the Common Core Standards, rather than promoting transparency and equity, might make us blind to the intrinsic worth and unique skills of each student. No longer human beings, they are now points in a big data matrix, in which their performance is mapped with mathematical precision to the performance of their peers. This breaking down of students (pun very much intended) into metrics will most likely lead to a kind of "lessergy" where total ability bears no relation to the sum of their artificially measured skills. A car made out of papier-mâché parts, which might have the same dimensions and at first glance pass for the real thing, will not perform very well on the road. And in much the same way, a student taught to fool the AES algorithms will hardly have gained any real-life skills in writing or critical thinking.

AES is of course only one facet of the big data-fication of education, but it is one of the most egregious ones. Until the two cultures divide has been bridged, policy makers will be as dumbfounded and seduced when told about "chi-square" correlations of automated essay scoring algorithms, and the "strange statistical properties" of human raters, as Diderot was when (if we are to believe the anecdote) Euler explained that given the equation:

\frac{a+b^n}{n}=x

...there is a God.

When I first read Hard Times 12 years ago, I thought it was a clunky, over-the-top satire. Now it seems eerily prophetic (yes, when he wasn't busy earning millions as a high-flying TA, Dickens actually found time to whip up a couple of novels):
"Utilitarian economists, skeletons of schoolmasters, Commissioners of Fact, genteel and used-up infidels, gabblers of many little dog’s-eared creeds [...]  Cultivate in them, while there is yet time, the utmost graces of the fancies and affections, to adorn their lives so much in need of ornament"
Perhaps this is precisely what is needed – a grassroots movement of teachers and educators, writers and poets, students and parents, who can do just that: cultivate some fancies and affections into the Commissioners of Fact, and tell the technocrats and Taylorists that there is more to life than what is dreamt of in their philosophies. Until then, a good way to start would be to sign this petition against Machine Scoring in High-Stakes Essays (with Noam Chomsky as one of the signatories).