SXSW: Big Data and the Race for the White House

Patrick Ruffini is moderating, suggests another name for the panel could be “Moneyball for Politics.” Excellent — I’m clearly in the right place.

Going to talk about all the uses for big data, how it’s used in politics for prediction, fundraising, more.

Patrick Ruffini — in Republican politics and tech for the past 3 presidential cycles. Now president of Engage, a DC political consulting firm.

Dan Siroker was product manager at Google and saw Obama speak, so became director of analytics for ’08 campaign. Now runs Optimizely.

Alex Lundry‘s first client was the Mitt Romney gubernatorial campaign. He is VP and director of research at Target Point Consulting.

Josh Hendler on Kerry campaign in 2004, then DNC, now Global CTO at H+K strategies.

Kristen Soltis, VP at Winston Group.

Ruffini – You can’t manage what you can’t measure. I’m a R myself but my hat goes off to the Obama team for doing such a good job with analytics. We are spectulating a lot today b/c campagin doesn’t release a lot of their strategy. One thing that’s assumed is that they’re doing a lot of mining on unstructured text — from tweets, comments from door to door campaigners, anything.

The culture of measurement started in 2008 — now passing baton to Siroker. He had an analytics team in Chicago. Siroker’s showing the splash page from BarackObama.com in 2007. He’s going to do a live multivariate test with us. They’re testing variations of the media and the button. We see 4 variations of button, 3 images and 3 videos. Few people chose “right” answers — but the answers improved their signup rate by 40% and added 2.9 million to the email list and $57M to the bank.

Siroker showing a Facebook app they optimized in similar way for great results.

Lundry’s firm started work offline with a “terrestrial” voter file. They’re using data modeling to try to make informed judgments about whether you’ll vote, who you’ll vote for, etc. Asks room if we saw Prius drive down the street, how many of us think the driver voted Obama? All room thinks so. Talks about how we can start to quantify it. Puts together a few variables…trying to figure out how you’re likely to vote. Doesn’t take many variables to get very specific.

What’s new now? Data harmonization — eliminating walled-off data gardens. How do you make the systems interoperable? Push to standardize data across the organization. Lundry says this is primary objective of Obama campaign. We are slowly lowering the wall between online and offline data.

Lundry says the real question now is who owns the data? In 2004, GOP had big data advantage, and in 2008 the Ds leapfrogged them. Now Lundry thinks they’re more even, but the data is owned, managed and used differently.

Hendler says the big questions now are related to gatekeeping … earlier this century, campaign staff would get huge boxes of paper shipped to them in the field office, with walk lists on them for door to door campaigning. All data decision-making was centralized, info was printed and used in the field. Now, some people are giving more access to staff and volunteers on the campaign, on the ground.

In 2008, saw wider access – volunteer could pull voter lists at home and make calls to support candidate. Data pulled in real time. In 2012, seeing more of this. Hendler says there’s greater possibility for success when you share your data wider within the campaign.

Also, trend to have more accessibility for data for more people. NationBuilder gives anyone access to a voter file. Ohio’s put the voter file on its website, available for download. Lots of organizations collaborating to share voter data now.

Hendler says analysis is really changing — you’ll have terabytes of data, but in the past, you had to have a really expensive solution to do ad hoc queries on terabytes of data — in the hundreds of thousands of dollars. Now, services like Hadoop or Hive let you do this much more inexpensively.

Another big change is move from periodic to real time. Before, you build a model, maybe refresh it once or twice over the campaign. Now, shift to real time. Online data is being leveraged, and that’s real time.

Another big change is the kind of models that can be made. Historic models were, are you D or R? How likely to vote? Now, also modeling likelihood to unsub, likelihood to volunteer, best channel for giving, etc. Helps you figure out how best to treat potential voters/donors/volunteers.

Soltis comes from a more traditional side of the industry. Talking about Telephone Consumer Protection Act of 1991 that makes it very hard to poll by cell phone — but 27% of households in US are now cell phone only. Online surveying is improving but has drawbacks. Soltis says, if it is harder to ask, we have to get better at listening.

Challenge: volume of conversation may have no relation to votes.

Twittersentiment.appspot.com: Measuring sentiment. But first result was “Who said it – Newt Gingrich or Buzz Lightyear?” Is that positive??

Shows another example of sentiment analysis not being effective.

Survey: Landline bias. Sentiment analysis: Online/activist bias. Soltis says, it’s not always wrong, but it’s different. It’s a variable universe and subject to interpretation. It’s evolving in real time and may be good to ID new trends. Surveys are a contained universe with concrete results — but it’s just a snapshot in time. It’s good for message testing.

Ruffini: If you had unlimited resources, what would you want to figure out? Lundry wants to figure out how to analyze candidate preferences in a multi-candidate primary. [He’s one of the Rs on the panel…I bet he’d like to figure that out. :) ]

Ruffini asks about how does Facebook change consumer/political data marketplace? Hendler says that Facebook is incredibly powerful. Have to figure out what data you can collect on Facebook you can actually use. FB is really sensitive about what data can be pulled from Connect.

Hendler thinks mobile apps will begin to supplant email … ability to communicate via notification with target audience. Soltis thinks that mobile holds some really interesting potential for pollsters.

Question fr audience on data security: Lundry jumps in to say data security is critical, privacy, etc. New question: Is there a different expectation of privacy of politically related data than with your consumer data?

Hendler thinks there is — political organizations are often talking to people who haven’t opted in. But they have to speak to voters.