Nieman Journalism Lab

A static website is a vending machine. A dynamic website is a restaurant.
What’s New in Digital and Social Media Research: What happens when robot journalists produce stories that are “good enough”
How to build your own Twitter bot army
Hacking in the newsroom? What journalists should know about the Computer Fraud and Abuse Act
Warren Buffett on newspapers: “___________”

A static website is a vending machine. A dynamic website is a restaurant.

Posted: 03 Mar 2014 11:13 AM PST

One major topic of discussion across all four days of last weekend’s NICAR (or five days, depending how hardcore you are) was the advantages of static news apps versus dynamic news apps. The conversation ultimately resulted in a staged debate on Saturday evening.

For NPR and static apps: @jeremybowers and @onyxfish For ProPublica and dynamic apps: @kleinmatic and @thejefflarson

— Tyler Fisher (@tylrfishr) March 1, 2014

But I had a problem:

@laurenrabaino I hope it opens with them explaining what that means.

— Caroline O'Donovan (@ceodonovan) February 27, 2014

Enter Noah Veltman, with a blog post for dummies explaining exactly what the difference really is.

A static website is a vending machine.

A dynamic website is a restaurant.

How so?

When people talk about a static server, they are talking about a server set up like a vending machine. When you ask for a URL, like http://www.nytimes.com/cheetos.html, there is an actual file already on that computer called cheetos.html, and all it has to do is send you a copy of that file. Talking about “flat files” refers to the same thing: having actual files that match the URLs people might request.

You punch E6, you get the Cheetos that are already in the machine, packaged, ready to eat.

Versus:

When people talk about a dynamic server, they are talking about a server set up like a restaurant. The food is made-to-order. The kitchen has ingredients, and the cooks assemble those ingredients into the finished product only AFTER someone orders that dish.

In this version, if you ask for a URL, like http://www.nytimes.com/crabcakes.html, crabcakes.html doesn’t exist yet. There is no such file. It’s just a menu item. The server waits for your request, and when the request comes in, it uses ingredients (e.g. templates, a database, the Twitter API) and a “recipe” to create that page on the spot, just for you.

When people talk about the “back end” or “server side,” they are talking about the kitchen: the stuff that a server does when it gets a new request.

The post goes into further detail about how those core differences effect content delivery and newsroom developer workflow.

What’s New in Digital and Social Media Research: What happens when robot journalists produce stories that are “good enough”

Posted: 03 Mar 2014 10:33 AM PST

Editor’s note: There's a lot of interesting academic research going on in digital media — but who has time to sift through all those journals and papers?

Our friends at Journalist’s Resource, that’s who. JR is a project of the Shorenstein Center on Media, Politics and Public Policy at the Harvard Kennedy School, and they spend their time examining the new academic literature in media, social science, and other fields, summarizing the high points and giving you a point of entry. Here, John Wihbey sums up the top papers in digital media and journalism this month.

Over the past month, the scholarly world has been cranking out new insights — some profound, some obscure, and some useful for newsrooms and media producers of all kinds. Meanwhile, Nicholas Kristof has kicked off yet another round of debate about whether academics are engaged enough (see Ezra Klein for the latest salvo, on gated academic journals and their consequences).

Amid all that, there are indeed some good nuggets coming from the halls of academe. Recent themes: Know thy network. Beware the rise of journo bots. And milk those Twitter users for cash. More on those below, where you’ll find a sampling of recent papers and their findings.

“Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters”: From the Pew Research Internet Project. By Marc A. Smith, Lee Rainie, Ben Schneiderman and Itai Himelboim.

This important new study, done in collaboration with academic researchers Schneiderman (University of Georgia) and Himelboim (University of Maryland), goes a long way toward making social network analysis and theory intelligible to the general public. In a clean, straightforward way, it lays out the six basic “archetypes” of Twitter conversation, giving precise language to phenomena many of us observe at only an intuitive level (and yet which researchers have observed for some time).

Having analyzed millions of tweets, the researchers conclude that political discussions often show “polarized crowd” characteristics, whereby a liberal and conservative cluster are talking past one another on the same subject, largely relying on different information sources. Of course, you still see old “hub-and-spoke” dynamics, or “broadcast networks,” where mainstream media are still doing the agenda-setting. But there are novel networks, too: “Support” networks that form around customer complaints, which looks like “hub-and-spoke” but also involves more two-way conservation; “tight crowds” involving niche interests, hobbies and professional groups; “brand clusters” around topics of mass interest (celebrities, for example) that primarily feature “isolates,” or people talking about the same subject but not to one another; and “community clusters” that “look like bazaars with multiple centers of activity” and which “can illustrate diverse angles on a subject based on its relevance to different audiences, revealing a diversity of opinion and perspective on a social media topic.”

Related: A new study in the Journal of Communication, “Social Media, Network Heterogeneity, and Opinion Polarization,” by Jae Kook Lee, Jihyang Choi, and Cheonsoo Kim of Indiana University and Yonghwan Kim of the University of Alabama, demonstrates the importance of news-related activities on social networks. Getting news, posting news and talking about politics on Twitter and Facebook seems to be associated with having a more diverse social network. Overall, the “role played by social media in the realm of public opinion is not simply optimistic or pessimistic,” the researchers conclude.

“The battle for ‘Trayvon Martin’: Mapping a media controversy online and off-line”: From the MIT Center for Civic Media, published in First Monday. By Erhardt Graeff, Matt Stempeck, and Ethan Zuckerman.

How a crime becomes political: Trayvon Martin and the way different media co-create the news

The Lab’s Caroline O’Donovan has already published a wonderful explainer on this study — worth checking out if you missed it. The study represents an ambitious effort to map public discourse around a national news topic — its ebb and flow, its catalysts, magnifiers and gatekeepers alike. How exactly do stories move across the wide array of information channels we use? The researchers conclude: “Our analysis finds that gatekeeping power is still deeply rooted in broadcast media…Without the initial coverage on newswires and television, it is unclear that online communities would have known about the Trayvon Martin case and been able to mobilize around it.” Effective public relations by parties involved saved the story from vanishing initially, and social media jumped on the bandwagon only later.

Graeff, Stempeck, and Zuckerman contribute important insights into the networked ecosystem of communication and news. The paper is a direct follow-on to an earlier paper by Internet theorist Yochai Benkler and Co., which suggested new network dynamics at work around the Stop Online Piracy Act (SOPA/PIPA) and related online activism. Both papers leverage the underappreciated Media Cloud project, which is finally getting its due. Graeff, Stempeck, and Zuckerman basically show a kind of counter-example to the Benkler findings. This scholarly back-and-forth is well worth paying close attention to, as MIT and Harvard’s Berkman Center have more papers in the pipeline along these lines. If we are to answer the ultimate digital media question — “How much has the Internet truly changed communication?” — this research will be a vital resource in providing the data.

“Enter the Robot Journalist: Users’ Perceptions of Automated Content”: From Karstad University (Sweden), published in Journalism Practice. By Christer Clerwall.

The study sets out to see how, among a small sample of undergraduates, people might judge differences between news content written by human journalists and by computers. The sample articles focused around National Football League topics. The subjects were basically unable to tell the difference between the two articles, and indeed on average found the computer-generated article more credible. Clerwall concludes: “Perhaps the most interesting result in the study is that there are no (with one exception) significant differences in how the two texts are perceived by the respondents. The lack of difference may be seen as an indicator that the software is doing a good job, or it may indicate that the journalist is doing a poor job — or perhaps both are doing a good (or poor) job?” He asks a provocative and, for many in the media industry, scary question: “If journalistic content produced by a piece of software is not (or is barely) discernible from content produced by a journalist, and/or if it is just a bit more boring and less pleasant to read, then why should news organizations allocate resources to human writers?”

“Inferring the Origin Locations of Tweets with Quantitative Confidence”: From Los Alamos National Laboratory and Illinois Institute of Technology, presented at ACM’s February 2014 Computer Supported Cooperative Work conference. By Reid Priedhorsky, Aron Culotta, and Sara Y. Del Valle.

This paper demonstrates that, although only a tiny fraction of people enable geolocation on their tweets, it is algorithmically possible to figure out where you are tweeting from, working from just proximal cues (particularly the mention of toponyms, or placenames.) The researchers analyze 13 million tweets and figure out the basic thresholds they need to infer location. Priedhorsky, Culotta, and Del Valle note that the findings have implications for privacy: “In particular, they suggest that social Internet users wishing to maximize their location privacy should (a) mention toponyms only at state- or country-scale, or perhaps not at all, (b) not use languages with a small geographic footprint, and, for maximal privacy, (c) mention decoy locations. However, if widely adopted, these measures will reduce the utility of Twitter and other social systems for public-good uses such as disease surveillance and response.”

Related: Also see other interesting ACM conference papers such as “The Language that Gets People to Give: Phrases that Predict Success on Kickstarter” and “Designing for the Deluge: Understanding & Supporting the Distributed, Collaborative Work of Crisis Vounteers. (A special thanks and hat tip to Meredith Ringel Morris of Microsoft Research, who co-chaired the papers committee for the conference.)

“An Empirical Study of Factors that Influence the Willingness to Pay for Online News”: From Universidad Carlos III de Madrid, Spain, published in Journalism Practice. By Manuel Goyanes.

Goyanes analyzes a random sample of 570 survey interviews done by the Pew Research Center to see how demographics and media use relate to paying for news and other online goods and services. Younger people, and those with incomes above $75,000, were more willing to pay for online news. Twitter users showed an increased willingness to pay. Goyanes states that “news organizations [should] consider Twitter not only a mechanism to distribute breaking news quickly and concisely, but also a marketing and interactive platform with which they can convince new customers to pay for their content through innovative marketing and advertising campaign.”

“Facebook ‘friends’: Effects of social networking site intensity, social capital affinity, and flow on reported knowledge-gain”: From San Diego State University, published in The Journal of Social Media in Society. By Valerie Barker, David M. Dozier, Amy Schmitz Weiss, and Diane L. Borden.

The study adds to the growing and voluminous literature on the human motivations behind activity on social media. The researchers set out to assess what makes people learn things on Facebook, and under what conditions they are more likely to acquire knowledge. Which quality is most important? As you might expect, it’s the desire to connect. The study analyzes a subset of data (236 persons) from telephone surveys with Internet users conducted in 2012. Barker, Dozier, Schmitz Weiss, and Borden conclude that it is not the intensity of participation in social networking sites that is the crucial factor in driving them to pick up knowledge. What matters most, it turns out, is a “sense of community and likeness felt for weak ties online” — social capital affinity — in terms of acquiring knowledge both through focused tasks and incidentally.

Photo by Anna Creech used under a Creative Commons license.

How to build your own Twitter bot army

Posted: 03 Mar 2014 09:38 AM PST

I am newly returned from Baltimore and the NICAR conference, where one of the most laugh-out-loud sessions of the weekend involved Brian Abelson, Joe Kokenge and Abraham Epton talking about how and why to build Twitter bots. Stephen Suen, of MIT’s Comparative Media Studies writing program, has a helpful blog post about the conversation.

Kokenge laid out the basics of making a bot. Epton talked about his ILCampaignCash, a Chicago Tribune product that tracks and tweets campaign donations. Abelson offered a long list of bots both humorous (like @FloridaMan or @Haikugrams) and practical (like @TreasuryIO or @YourRepsOnGuns) that suggested the breadth of possibility when it comes to bots. There are also, of course, challenges:

Brian says the logic behind the Twitter bot is strict rather than greedy. He also points to issues faced with Times Haiku. "The challenge is, how are we not going to make a haiku of the Syrian civil war, how are we not going to make a haiku of something that's serious… that's why it's easier to do some of these funny artistic ones rather than something you can put the name of a newsroom on."

Once again, rate limiting is brought up — Abraham says you can write the logic of your bot to avoid having your account get deleted. “Use common sense,” he says. The more you avoid behaviors that make your bot seem like a spam bot, the safer your account will be. Joe and Brian agree — the rate limit is high enough that you can get away with a tweet every 5 minutes without hitting it.

Hacking in the newsroom? What journalists should know about the Computer Fraud and Abuse Act

Posted: 03 Mar 2014 07:30 AM PST

Some people who scrape and publish information from the Internet go to jail. Others produce great journalism. It’s easy to understand why you might want to know which person you are — and whether or not you’re protected from prosecution or not — but that can be a difficult task.

That’s why there was a discussion on the topic at the Computer Assisted Reporting conference in Baltimore last week. ProPublica’s Scott Klein, Scripps Howard’s Isaac Wolf, and defense attorney Tor Ekeland participated in a conversation moderated by The Wall Street Journal’s Jeremy Singer-Vine.

Wolf is a Scripps News reporter who garnered some attention last spring when he reported on a major security breach at a company called TerraCom. In the course of a typical PDF search, Wolf discovered that personal information including Social Security numbers, addresses, and other account information had been left vulnerable. Publishing his findings led Wolf and his colleagues to be branded as “hackers.” Sarah Laskow wrote in CJR that the Scripps case may well be the first time a journalist was threatened under the Computer Fraud and Abuse Act.

The Computer Fraud and Abuse Act is a law that prohibits unauthorized access to information on a protected computer. It’s the statute under which Andrew Auernheimer, better known as weev, was prosecuted and sentenced to 41 months in prison for taking evidence of a security flaw in AT&T that left user email addresses vulnerable to Gawker. (It’s also the law that led to the prosecution of Aaron Swartz.)

One of Aurenheimer’s attorneys was Ekeland, who provided a legal perspective for the journalists at NICAR on issues around the CFAA. “It’s a very dangerous statute, because it’s so poorly written,” Ekeland said, “and they’re about to make it worse.”

.@TorEkelandPC on prosecution of scrapers: Not an objective standard, subjective stand that turns on whim of the website owner. #NICAR14

— Tyler Dukes (@mtdukes) February 27, 2014

Klein and Singer-Vine are both journalists who have worked on or edited stories that involved, in different ways, practices that could fall under the hacking umbrella. For example, ProPublica published MessageMachine, a project that used reverse engineering to figure out why certain people received specific personalized emails from the Obama campaign. Singer-Vine worked on a story about online pricing inequality on the Staples website.

The focus of the panel discussion was around how journalists interested in doing this kind of work can protect themselves and ensure that they’re on the right side of law. Because the law is nonspecific in its language — and widely decried as outmoded — interpretations of what’s legal and what’s not vary wildly. “The press is protected by virtue of the fact of who they are,” Ekeland said. “I don’t see any difference between what my client did and what Isaac did, except my client is an asshole.”

At ProPublica, there are deliberate rules about how a journalist seeking information online should represent themselves. Klein said that reporters there are banned from creating “straw men,” or programs that falsely suggest the existence of an actual person. That’s why, for the MessageMachine project, users were crowdsourced, and their information — information pertaining to real people — was used to analyze the campaign email algorithm. “I don’t feel like it would have been morally wrong to create straw people, but I can see why adopting these moral ethics…makes sense,” Klein said.

(Klein said they ultimately realized that creating fake users wouldn’t have worked anyway, and that the crowdsourced user base has more value and longevity.)

At The Wall Street Journal, Singer-Vine said he had a similar debate over self-representation. Ultimately, his team tracked Staples price differentials by modifying the cookies the system relied on to track users, a technique that they felt was significantly different from creating straw men. Whether a judge would consider that action acceptable under the CFAA or is less clear.

“Go find a journalism ethics book that says when you can find and manipulate a variable in a cookie,” said Klein. “Good luck! We’re working without a net.”

It’s worth noting an argument introduced by Ekeland on this topic. Framing the issue as a journalist lying to a computer, perpetuates the notion that they’re dealing with something other than a computer. In point of fact, machines don’t have a sense of truth — there are only inputs and outputs. “The computer isn’t being deceived, it’s doing what it was programmed to do,” he said. “We want there to be physical, real world analogies, but the computer people don’t do that.” Not all agreed, however:

.@TorEkelandPC says you can't deceive or lie to a computer, but what about SEO black hats … definitely lying to the googlebot #nicar14

— Nick Diakopoulos (@ndiakopoulos) February 27, 2014

Ultimately, the conventional wisdom seems to be that reporters hoping to stay out of court should be very upfront about their intentions, conservative in their judgments, and confident in the value of what they’re doing.

Crowdsourcing campaign spending: What ProPublica learned from Free the Files

Klein, for example, explained how easy it can be to violate the law accidentally. ProPublica was working with a series of FCC filings at one point while developing a story about who pays for campaign TV ads. The stations are required to make this information publicly available, which is how ProPublica acquired the documents, only to discover later that scanned personal checks were included in the PDFs. Luckily, their reporters realized in time, and were able to do a search for the phrase “pay to the order of,” and delete the information from DocumentCloud. Clearly, there’s a need to proceed with caution as journalists continue to gain access to sensitive documents that are publishable on the web in full.

While the ethics of various methodologies were up for debate, and while interpretation of the law remains opaque, the panelists largely agreed on how journalists can best protect themselves right now.

“You want to be able to demonstrate that you’re using this information for a journalistic purpose,” said Wolf. “Assume that you’re going to be challenged. What is your story? You’re going to be prodded by the entity or company. Reporters elsewhere are going to be asking you questions.”

In addition, he recommends keeping track of process, so that a step-by-step narrative of what steps were taken and why can be presented if necessary. Journalists are protected, but ultimately, they’re only safe if it can reasonably be proven that leadership at their organization concurred that the measures taken were in pursuit of the public good — that the information is, in Scott Klein’s words, “not gossip — it’s not prurient.”

Just last month, the Department of Justice communicated its interest in working to narrow the scope of the CFAA. There are multiple cases in appeals court; as rulings come down, and as lawmakers push for reform, the hope is that the law will become less vague. As Wolf pointed out, if journalists want to be a part of shaping a statute that has the potential to curtail their tools for gleaning information, now is the time to get involved.

Image of a gavel by Joe Gratz used under a Creative Commons license.

Warren Buffett on newspapers: “___________”

Posted: 03 Mar 2014 07:00 AM PST

berkshire-hathaway-media-group Warren Buffett’s Berkshire Hathaway has, over the past few years, bought up dozens of newspapers, with 69 papers and other titles currently part of the BH Media Group, including the Richmond Times-Dispatch, Greensboro News & Record, Omaha World-Herald, and Tulsa World. In the 2012 edition of his legendary annual shareholder letter — seriously, their level of clarity is something most journalists can only aspire to — Buffett went on at some length about the purchases:

Newspapers continue to reign supreme, however, in the delivery of local news. If you want to know what's going on in your town — whether the news is about the mayor or taxes or high school football — there is no substitute for a local newspaper that is doing its job. A reader's eyes may glaze over after they take in a couple of paragraphs about Canadian tariffs or political developments in Pakistan; a story about the reader himself or his neighbors will be read to the end. Wherever there is a pervasive sense of community, a paper that serves the special informational needs of that community will remain indispensable to a significant portion of its residents.

On Friday afternoon, the latest edition of Buffett’s shareholder letter was released, and I went to it quickly to see what new thoughts there might be inside from the Oracle of Omaha about the newspaper business.

The answer: nada.

The only reference of newspapers was a pitch for BH’s “third International Newspaper Tossing Challenge.” Buffett’s got a good newspaper-tossing arm:

Understanding the billionaire media gambles

Now, there’s certainly no shame in being left out of a Berkshire Hathaway shareholder letter. The reach of Buffett’s empire is so broad and diverse that expecting a newspaper update every year is a bit like expecting the Roman senate to demand the latest from a small town in Mauretania Caesariensis at every meeting.

But the newspaper business can use all the outside business smarts it can get these days. Berkshire Hathaway is of course famous for letting its component businesses run themselves, but I think this year’s absence of attention might be a tiny, tiny piece of evidence that Martin Langeveld was right when he characterized Buffett’s interest in newspapers this way in our year-end Predictions package:

I think Warren Buffett is really pursuing a mop-up strategy. He says otherwise, of course: "Wherever there is a pervasive sense of community, a paper that serves the special informational needs of that community will remain indispensable to a significant portion of its residents." What else is he going to say? He may actually believe this, and believe that printed newspapers will remain viable for a long time, and may prefer to read news on paper like most people in his generation.

But Buffett's backup strategy is this: He is buying newspaper assets cheap and not investing much into them, in the expectation that even if they lose all value over the next 6 or 8 years, he will have made a decent return on his investment…

Warren Buffett will continue buying newspapers wherever he can do so very cheaply. No grand strategy, no new business models for news will emerge from Omaha. Ultimately, these papers will be closed or sold. It's a mop-up.

YULI AKHMADA

Selasa, 04 Maret 2014

Nieman Journalism Lab

Nieman Journalism Lab