Narrative Science

In his novel Timequake, Kurt Vonnegut tells of an architect named Frank who encounters a software program named Palladio. The program promises to enable anyone, regardless of training, to design any kind of architectural structure, in any kind of style, simply by specifying a few basic project parameters. Frank doubts that the program could really replicate the skills and knowledge he has gained and honed over many years, so he decides to put it to the test. He tells Palladio to design a three-story parking garage in the style of Thomas Jefferson’s Monticello. To his amazement, the program doesn’t refuse or crash. Instead, it takes him through menu after menu of project parameters, explaining how local codes would alter this or that aspect of the structure. At the end, the program produces detailed building plans and cost estimates, and it even offers to generate alternative plans in the style of Michael Graves or I M Pei. In typical Vonnegut style, Frank is so shocked and filled with dispair that he immediate goes home and shoots himself.

Narrative Science LogoI was reminded of this scene in Vonnegut’s novel after reading an article about the company Narrative Science. They have produced a software program that can automatically write news stories, in human-like prose, about sporting events and routine financial reports. They are now branching out into other genres, like in-house managerial reports, restaurant guides, and summaries of gaming tournaments. Last year they generated 400,000 such stories, all without a single human journalist.

Well, not quite. Like all software programs, their program has to be trained, not only about the rules of a particular domain, but also how to write appropriate-sounding prose for the target audience. The former is done by statisticians and programmers, but the latter requires seasoned journalists, who provide templates and style guides. Theoretically, however, once those journalists train the program to sound like them, the program could generate millions of stories all on its own.

So far, this program has been used to generate stories about minor sporting events and routine financial reports that normally would not garner the attention of a real reporter. For example, parents can capture play-by-play data about their son’s little league baseball game, and submit that to Narrative Science. In a few minutes, the program can analyze the data and generate a story that highlights pivotal moments in the game as well as the final outcome, all written in that flamboyant style of a veteran sports reporter. By looking at the earlier games in the same or previous season, the program can also comment on how the team or individual players performed relative to other games and similar match-ups.

Similarly, most corporate earnings reports go unnoticed by journalists, but this program can quickly analyze the various numbers, compare them with other firms in the same industry, and generate a story for stock holders and other interested parties that highlights important changes in the company’s performance.

Narrative Science is proud of the fact that their program has not yet put any journalists out of work, and they believe that it will be used primarily to generate stories that would normally never have been written in the first place. But when asked how long they think it will take before one of their computer-generated stories would win a Pulitzer Prize, their CTO guessed that it would be within five years.

I’m a bit dubious about that last prediction, but I do find their system very interesting. Narrative Science has essentially picked the low-hanging fruit of professional writing: those routine, boring, and generally formulaic stories that might as well be written by a computer. In some senses, their program is similar to a simple machine tool that is able to construct some particular kind of part over and over again, but in another sense, they have gone far beyond that. By combining data mining techniques with prose generation, they have created a system that can not only find new insights in large datasets, but also communicate those with a wide audience in a style that the audience will recognize and trust.

But before we start worrying about whether their program will soon put all journalists out of work, we need to realize that this kind of program only works in data-rich domains, and the kinds of insights it can generate are limited to the quantity and quality of the data it receives. It can generate insights from complex data sets that a human might not notice, but it can’t really understand those irrational and mirky depths of human emotions, motivations, and desires. I have a hard time, for example, seeing how it could cover a complex public policy debate, or ask tough questions about how a certain dataset was collected, and whether it might be skewed or biased in some way.

Kurt Vonnegut’s first novel, Player Piano, was written in 1952 after he saw an early machine tool quickly make a turbine part that used to require a skilled machinist much longer to accomplish. In the novel, he imagined a dystopian future where blue-collar workers had nothing left to do, and the entire society was run  by managerial technocrats. We now know that things didn’t quite turn out this way (see David Noble’s classic book Forces of Production). Similarly, I don’t think that newsroom management will ever be able to replace human reporters entirely. No doubt, some of the more routine and formulaic reporting will become automated, but the more idiosyncratic stories will still requite a reporter that understands the human condition.

About these ads

2 thoughts on “Narrative Science

  1. Michael Munnik

    Thanks for bringing this to my attention, Dave. As a reformed journalist, I find the concept both appalling and amusing. How often did my colleagues and I feel, around the newsroom, that a computer programme could bang off the routine stories that we put together every day? (Okay, I worked in radio, so it’s not that easy; we’ve seen the experiments with virtual presenters in sports television news, but as talk-back technology gets more sophisticated, it’s not hard to see a similar intrusion into my old medium.) From your depiction, it sounds like Narrative Science doesn’t pretend to be what it’s not – it is about handling routine news stories in, as you point out, a data-rich environment. It’s not the vehicle for breaking scandal, unearthing blistering stories, or even challenging the “official line”. We still need humans for that.

    But there are two things that immediately leapt to mind as I read this, as benefits of having humans cover the routine news. One is lateral connections – linking one routine story, perhaps about city budget deliberations, with another story, perhaps about crime statistics. Both fairly routine, but I believe it would take a truly inventive human to link them, placing one inside the other as a sort of trend piece. This is the sort of journalistic manoeuvre that makes otherwise routine reporting sparkle. Perhaps I underestimate the computer matrix’s ability to compile and cross-reference stories, but it still seems like the kind of thing you’d still need to run past an editor (“What if we did the B&E stats story through the lens of an increase in budget for city gardening?” “Dunno – sounds sketchy. What are you thinking?”) This is perhaps a poor example, but I hope you get what I’m driving at.

    The other is the benefits of human contact that come from the routine. I’ve been reading plenty of sociological and anthropological ethnographies of newsrooms in the last few months, and one thing is clear: being there still counts. The human contact between reporter and various bureaucrats that comes from doing the rounds – visiting city hall, talking to court officials, chatting on the phone with stock traders – builds a relationship that doesn’t pay the dividends of original story ideas every day, nor even every week necessarily. But when something happens, that bureaucrat is going to pass that on as customised information. If a computer programme is the one gathering that boring data and turning it into a story, no one is going to give it the tip, the lead, the unusual event that seems to fit this particular reporter’s style or this publication’s interests.

    So not only is there a need for human journalists to do the non-routine work, there is also a benefit to having humans do the routine stuff. Because without the workers doing the routine, the non-routine will start to dry up.

    Reply
    1. David Stearns Post author

      I thought this story might interest you Mike! And thanks for these really good points. I most certainly agree that lateral, imaginative connections and benefiting from the building of long-term relationships are outside the abilities of something like Narrative Science’s program.

      In the history of automated machine tools, the factory management specific tried to replace machinists with automation in order to break the unions and deskill the workers. But it turned out that the early machines were not up to the task, and in many cases the machinists were the ones who fixed the automated tools by adjusting their programming. The machine tools took away some jobs, but they created a whole raft of others, some of which were filled by former machinists.

      Of course, journalism is not manufacturing, but the history of automated machine tools points towards a bit more symbiotic relationship developing between humans and the automated tools.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s