Archive | Information management

SXSW: Dawn Foster on Hacking RSS

So I’m at Hacking RSS by Dawn Foster. She’s got some more stats about how much data we have in the world. Seems to be a big topic this year here at SXSW. [Incidentally she’s got some stats on her slides if you’d like to look. I didn’t scribble down the numbers.]

We’re talking about how to manage your RSS feeds. We subscribe to feeds that we mostly like, but everything in them may not be interesting. And it’s hard to keep up. And there are lots of feeds out there that we don’t subscribe to.

Some tips:

  • Spend the time to categorize.
  • Stuff you really care about at the top.
  • Don’t try to read everything.

How do you find sources you wouldn’t run across?
The Tweeted Times. Takes links from people you follow, and people they follow, and displays in a newspaper format with most-tweeted stuff at the top. Great way to catch up on Twitter.
Techmeme. Good way to get a global sense of what’s popular.

Foster says the magic is in filtering. Sets up Yahoo! Pipes to filter feeds on keywords she cares about. She also uses PostRank to filter up the blog posts that have lots of comments or mentions.

Big fan of Yahoo! Pipes. You can filter on any kind of data that shows up in the feed. Downsides: Learning curve, sometimes flaky. And it could be killed by Yahoo!.

FeedRinse: Easy to use, not as flexible.
FeedDemon: Allows some filtering.
Many of the smaller services have gone out of business.

Foster has a lot of screencasts on her blog about how to use Yahoo! Pipes.

Simple filter: Gives Yahoo! Pipes two RSS feeds, a few keywords to filter on, and it will spit out one RSS feed that displays only the posts matching the criteria.

PostRank takes a feed and ranks the posts based on engagement. Then you can get the output as an RSS feed. Then, the PostRank info goes into the RSS feed, so you can use it again in Yahoo! Pipes.

RSS feeds should include title, author, dates, links, content, but many also include things like latitude, longitude, and lots more — but your RSS reader likely isn’t displaying that. Also, many APIs include even more data and can be turned into RSS.

So using Postrank, she runs 3 RSS feeds through PostRank and then takes those best-post feeds and runs it through Yahoo! Pipes and sorts by best posts overall on a topic she’s interested in.

She also uses Yahoo! Pipes to modify the format of RSS feeds. This is cool. You’ll have to pull her slides, but she has one that shows you exactly how she’s reformatting feeds to make them all look and feel the way she wants.

Now Foster’s talking about using APIs. She’s going to show us what she does with BackTweets. This service tells you who’s sharing links, regardless of the shortening service they use. They used to offer info in a feed, but don’t anymore. So now you can use their API to get it. She’s building a feed that will show her who’s talking about her posts on Twitter. It involves the BackTweets API, the Twitter API and Yahoo! Pipes. Get the slides to see all the details.

Also has a nice flow for doing a vanity search using Yahoo! Pipes. I’m going to check this slide out later.

Foster’s caveats:
Don’t ever use this in a production environment. Instead, write it in a real programming language with cached results and error-checking. You can’t build your business on Yahoo! Pipes.

Oh good audience questions: Can you manipulate audio and video files included in the RSS feeds via Yahoo! Pipes? Foster says, you can definitely pass that through into your output feed, but doesn’t know of any way to evaluate that info in Pipes.

Also, question from the audience prompts Foster to clarify that when you use RSS, you still have to abide by licensing and copyright restrictions that the original content creator has.


Recommendation Engines: Going Beyond the Social Graph

Hunter Walk. Leads product team at YouTube.
Tom Conrad. Product engineering at Pandora.
Garrett Camp. Cofounder of Stumble Upon.
Lior Ron from Google’s Hotpot project. Works on local recommendation engine.
Liz Gannes, journalist, moderating.

Conrad: Pandora has 8 billion thumbs up/thumbs down data, completely contextualized.

Walk says that YouTube knows a lot about where their videos are embedded. Talks about could personally review videos, or use algorithm to analyze videos, but they are also look at what the top blogs/sites are pulling from YouTube to understand what videos are popular with whom.

Walk: About 50% of searches on YouTube are “broad,” meaning the person is looking for an experience, not a particular video. Google has to figure out what the best videos are to help someone understand/experience a topic. It’s very different from trying to answer a question, like we think of in traditional search.

Camp: We want to get away from 10 blue links. We want to be surprised, have serendipitous experience.

Conrad: Looking at most common starting points a couple of years ago. One of top ones was called Christmas. The station was seeded with an indie rock band called Christmas. Oops. So then they started playing the station to see what happened, and it was playing all holiday music. The crowd had very quickly weeded out the data error by thumbing down the band on the holiday station.

Walk shares story of a teenager who told them that she wanted to know what her friends hadn’t watched yet on YouTube, so she knew what to share. It’s a hard problem, but they want to figure out what’s not yet spreading, but will.

Camp: StumbleUpon tests new, non-socially-recommended stuff in streams to figure out this kind of question. When you’re just looking at the social graph, you’re in a closed loop.

Ron: Social is really important in recommendations because of the trust factor. Getting a friend recommendation still beats the site telling you, hey “other people like you” like this.

Walk: Early on, they just trusted what the uploader said about what the video is. Now, they use a lot of technology to understand a piece of content. What does perfect metadata look like? What’s everything I COULD know about it? And then the challenge is, you take all that data to try to create an experience, not just spit out data.

Walk: One of biggest changes in perceived search relevance was when they started showing context for recommendations. Immediately, people thought recommendations were more relevant. And two, if the recommendation was wrong, they blamed themselves, not YouTube.

Conrad: Pandora has broken out of the PC into mobile and now car implementations. The difference in environment between listening at work via headphones to listening in the car with the whole family and lots of people making the music decisions is very complex.

Camp: StumbleUpon is often a free-time application, so their new mobile app [6mo old] is doing well.

Ron: Interesting patterns in how people are following people for recommendations. Some people follow only celebrities, for instance.

Camp: For analytics, they look at thumbs up/thumbs down, length of time on resource, comparing time to type of resource. SU has an 80-85% thumbs up rate.

Walk: You have to be careful with analytics. You don’t want to introduce features that push up your positive stats to the detriment of user experience.

Conrad: They religiously test all changes to the algorithms now, after making several changes in early days that “everyone” agreed would be great, that instead tanked numbers.

Ron: Recommendation is very vertical-oriented. The required data is so specific that it’s hard to have a general recommendation engine.

Camp: Also, UI affects what kind of data you get a lot, so that’s part of why people build the engines themselves.

Ron: We’re not living in a world yet where we’re bombarded by awesome recommendations and we have to tune them. Part of the problem right now is getting coverage for everywhere.

Camp: We do a combination of social and similarity in your recommendation list.


SXSW: Todd Park from HHS on the Power of Open Health Data

Interesting…Todd Park introduces himself as the CTO and “entrepreneur in residence” at the U.S. Dept. of Health and Human Services. I think his point is that his background is tech entrepreneurship. He says, “That may lead you to ask what the hell I’m doing working for the federal government.”

My notes below are paraphrases [my best efforts] of Park’s talk and my comments in italics.

So he is supposed to work with the government to figure out how to harness the power of data to improve public health in America. He’s going to describe several things they’re doing at HHS. Never been a better time to be an entrepreneur at the intersection of health care and IT. Amen to that.

There are new incentives + information freedom that add up to rocket fuel for innovation.

Starts with “meaningful use,” the new Medicare/Medicaid incentives that reward meaningful use — improving outcomes — of electronic health records [EHR]. Government is trying to send a signal to the industry of what appropriate, meaningful use of EHR is.

Meaningful use is the appetizer when it comes to incentive change.

The big enchilada is payment reform.

Obamacare [OK, he uses the formal name of the bill, the Affordable Care Act] is designed to shift from pay for services to pay for health and value.

The Centers for Medicare and Medicaid Innovation Center is funding to identify ways that payment reform is already working, identifying experiments that work. Oh, I remember this. This is the part of the bill I was scoffing at, thinking it was too small-scale to make a difference. Park says that if Medicard/Medicaid identifies a working reform, then it’s a regulation change, not a law change, to implement it. That’s the secret sauce in this bill, it seems. The $10B to fund the innovation center has been “appropriated.” Hmm. I have to check on that. I thought that’s what the GOP is trying to de-fund.

Reforming payment systems
He’s got a really massive slide with a lot of actually useful terminology on it, but it’s a lot to capture and explain here. Gist of it is, there are a number of ways that we can improve access and reform payment to achieve savings and better health. Good. He’s posting his slides somewhere. More on that later.Updated 3/19/2011: Park’s slides from SXSW are now on Slideshare.

Information Liberation
Park enjoys saying liberacion in Spanish with great flair.
The Direct Project
This is a collaborative project to enable simple, secure tramission of health care data over the Internet. 60 vendors now implementing this solution according to government standard released in 6/10.

Blue Button
This allows any veteran or Medicare beneficiary to get an electronic copy of their own health information. Launched in October 2010 and more than 200,000 downloads so far.

So another initiative is trying to make the market more transparent. is part of this. It has a comprehensive list of all open insurance plans in the use, including pricing. They will release APIs of this data later in the year.

Next: Want to morph HHS into the NOAA of Health Data. Now this is kind of cool. They have already begun publishing data on the CHDI website. There’s some interesting looking stuff there. I will have to dive in further later. From a work session they had last year…Park says Tim O’Reilly said, you can’t make people find the data. The data has to find them. Now, Bing is using the CHDI data to show patient satisfaction data in search results when you search for a hospital. National Association of Counties helps counties set up public sites using this data, showing health information for their communities. Healthy Communities Dashboard. Community Clash is another site built on this health data. [Disclosure: Community Clash is a Healthways site, which is a client of mine, but I haven’t worked on that site.]

Asthmapolis….lets you geographically track where you are having asthma attacks with a GPS connected to your inhaler. Soon will be anonomizing the data so we can have asthma maps to find hotspots.

This guy is rapid-fire machine-gun spewing health data projects at us. There is a LOT going on.….downloadable data and available via API. Community health data.….APIs to compare hospitals, nursing homes, home health, dialysis. Soon physician compare.
They are going to take Medicare claims files available to qualified people [ie., people who can handle privacy requirements, I guess] to do quality analysis.

MedLinePlus – Can send patient education materials in response to EHR queries via its API.

All this stuff is mentioned/linked on Oh, also includes a link to other sites offering free health data.

Mentions a brand new funding org for health apps: Rock Health.

For those who think health data, or any data, is a snore, please see Todd Park speak as soon as possible. I have never seen so much energy about data.

Now he’s talking about entrepreneurship and startups. “If you get the best people, you win.” Wants to get superstar talent focusing on health data. He wants you to contact him to do a health data camp or if you need help with health data: or @todd_park. Is now begging people to email him. Please contact this man about health data.

Park is getting a real accolade from a guy behind me, who tells the room about the industry suffering through a decade of empty promises and now the past year of fabulous, growing access to government health data. Well said.


IA vs. UX vs. content strategy vs. your name here

There’s an interesting editorial over at the fall 2010 issue of the Journal of IA, which I do like reading. Eric Reiss spends some time trying to place information architecture, user experience and content strategy in terms of each other. I don’t think it’s an entirely worthless endeavor, but in my opinion, he’s bitten off a ginormous challenge. We’re the people who like to organize, categorize and name things. So no wonder we don’t all agree here. Reiss has certainly put his finger on an ongoing point of contention.

A much more recent post by Erin Kissane tackles the same topic from a different angle, making content strategy more of the umbrella.

I’d draw a bigger picture though. I’d put the business strategy umbrella over the top of the project as a whole. It’s got to define your work, no matter your discipline. To my mind, then, systems, development, UX, IA and content strategy all need a seat at the table to get from strategy through to executed product. There are a number of ways to make the process work — even how to define your business strategy. And depending on which process you use, one discipline or another may take a more prominent role.

In the end, I think the argument is largely academic. The critical thing is that the disciplines of content strategy, IA and UX all seem to get more respect now. When I started working on the web, there was design. And HTML. And then content, but in the “words-go-here” variety. Things have improved a lot since then — consumers have gotten much more sophisticated in what we demand from our web applications, and those of us in the web industry have responded to that. There are still people trying to execute web projects and applications without content strategy or IA or UX, of course. But if you want your work done effectively and well, you need all three.


I need a sharable calendar with tags

Dear programmers of the world,

Here’s a problem that seems to need solving*. Please, my friends and I beg of you, please help.

My employees and I use Google Calendar. It’s great, because it integrates with and iCal and our phones and everything else we want to use to view calendars. Each of us has her own calendar, so we can see when someone’s out of the office, who’s in a meeting, whatever.

My family and I also use Google Calendar. I have a personal calendar category that’s separate from my work calendar, and my husband has a calendar, and my kids do. We have extra calendars for things like family birthdays that we all want to subscribe to.

About 75% of the time, this is a great solution for us. But frequently, I run into events that need to be in two places — a personal event happening during the work day, so my employees need to know I’ll be out of pocket over lunch — or a work event that runs into the evening, and my husband needs a reminder that I won’t be picking up the kids.

My taxonomic nature wants to throw a tag on that — anything I tag with “AshbyView,” say, should show up on my husband’s version of my calendar, or if I tag it “WorkView” then my employees ought to be able to see my personal event.

I cannot find that this is possible. And yet it seems SO possible.

So, dear programming friends, either tell me that I’m an idiot and to get XYZ Calendar Solution, or take this brilliant idea and let me know when it’s ready. There are families out there that would pay for this kind of information flexibility and accessibility.

*Should I be mistaken about this point — should you know of a web-based, cheap calendar that is easily sharable, exportable, on the iCalendar format and uses tags, please let me know!


Quora: Lessons in community development

My trial-by-fire in online community management came just over 10 years ago, when we launched I’ve thought about that experience a lot in the past few weeks, as Quora has exploded among the tech community. [Find me on Quora here.]

When we were preparing for the real launch of, we spent weeks scouring web forums for people who were talking about small business online. There were lots and lots of them…but there weren’t easy ways to find, organize or communicate with them. So we found most of the people we invited to beta test our site by hand, searching one by one through web forums.

Long before CAN-SPAM or marketing protocols, we were very careful about how we approached prospective testers, because we were long-time web users, even then. We knew that all communities have their own etiquette and customs, and to violate those is to risk death.

At the same time, we spent a lot of time and effort figuring out how to seed the community with content so there’d be SOMETHING there when the first beta testers logged in. It was important to us to be transparent — to be real about who we were, but still to provide value to our soon-to-be customers.

And so it is difficult in some ways for me to watch the birth of Quora, because I feel a lot of it in a personal way. The people complaining on Twitter about getting dozens of emails in an hour when all their friends join. [Oh, for there to have been Twitter when we launched!] The people wandering around saying, “What’s the point of all this? I can’t find anything on there!” And the endless critiques of its design and usability.

Welcome to website launching, 2011-style.

The really interesting part to me is the question-and-answer format, though, because was also a question-and-answer site. It worked a little differently from Quora, but the similarities are many.  [It’s no longer the same format — lives on today as a great wiki for small business owners, managed by “head helper” Rex Hammock.]

What I wonder now, and I don’t remember wondering 10 years ago, is whether the literal format of Q-and-A is really the best way to answer questions.

That may sound silly on the face of it, but much of what many of us are doing online is trying to both ask and answer questions. And now, after almost 20 years of working on the format of information, we’re still offering the questions and answers literally.

I’m cheered by the increasing discussions about structured content and the semantic web. But the depth of structure truly available remains small, compared to what we need. Don’t get me wrong — I’m not sneezing on the vast amounts of location information we’re using now, for instance. Or XML. But most of the value on the web is really still locked in text.

“Locked in” sounds funny when you consider how much you can find with your favorite search engine. But what you can’t do with much of the information is re-purpose it easily.

Quora doesn’t advance us down that road, but I’ll be curious to see how it fares, both as an information source and as a community.


Context Is Always Critical

Got into an interesting back-channel discussion today in the South by Southwest session called “Beyond Algorithms: Search and the Semantic Web.”

I did write another post on the panel, so I won’t go into the details here, except to say that I found the backchannel more thought-provoking than the panel itself.

So when I got into the session, I realized I had left my power cord in the hotel room and I was running on reserve power. I sent a tweet to ask if anyone in the rather large ballroom had a Mac power cord I could borrow.

I quickly heard back from Tim Bentley, who was generous to share his power cord with me for the session. And so it was coincidental, certainly, when I noticed he’d come from Aardvark, a social search engine.

I think it was during the part of the panel where they were discussing how standard search engines don’t really know if they’ve answered your question, and Bentley tweeted to say this:

#beyondalgorithms panel is basically talking about how to do algorithmically what Aardvark is doing now socially

So a few minutes later, I started wondering about Bentley’s perspective on Wolfram|Alpha, which bills itself as a “computational knowledge engine” and promotes the fact that its information is curated by experts. I have a long-standing bias against people who purport to be “experts” — it’s a knee-jerk sort of reaction and I can acknowledge that.

On the panel, a tangential discussion cropped up about how much context matters in search. It was the sort of conversation that I was far more interested in than the topics they actually intended to discuss. So it got me thinking that it’s not expert curation or knowledge that I dispute — it’s so-called expert knowledge applied without regard to context.

There are so few questions in this world with a black and white answer. Once you go beyond 2+2=4, you need to know the context to answer. And then most expert opinion can sound downright asinine when it ignores context.

So that’s the kind of question I’d like to see explored deeply: How do we apply context to computer inputs [searches, using the computer, applications, whatever] in order to more accurately and efficiently reach solutions for users?


Beyond Algorithms: Search and the Semantic Web

Wow. There are a lot of speakers here, and they aren’t all listed in the program….and there’s no way I’ll get them all straight. I’ll see what I can do.

Gil Elbaz, founder/CEO of Factual. They simplify access to clean, reliable data for publishers. Structure and clean data.

Danny Sullivan, Searchengineland

Carla Thompson, Guidewire Group. Search and semantic analyst.

Dag Kittlaus, Siri

Barak Berkowitz, has been at Wolfram Alpha for 10 days.

Will Hunsinger, CEO of Evri and Twine

Nova Spivack, founder of Twine, now at LiveMatrix.

Barney Pell, Microsoft Bing team.

Haha, first real question is, what does semantics mean? We’re going to discuss the semantics of semantics.

Someone [Pell, I think] says, it’s about meaning, figuring out which words match with other words. Also about the abstractions that tie words together. It’s a middle layer that connects the underlying layer to the higher intent.

So Google and Bing are already semantic search engines? Yes.

Thompson says, no that doesn’t clear it up. You lost the consumer after the word abstraction. I think we should get rid of the term.

Pell: I think it’s not a consumer term. It’s a technology term.

Kittlaus says, I’ve been in the Valley less than 3 years and I’m amazed at how little creativity is there in the search field. People argue about who has the biggest database and not about how to solve user’s problems.

Panel is arguing about whether or not today’s search results are adequate or should be replaced with something yet-to-be-conceived. Total geek amusement is all you can say about this.

Good point: Panelist says we have a scalability issue. There’s so much accessible data today, that a solution that could handle a million pieces of data isn’t the best solution for a trillion pieces of data.

Right now, search is good at answering single question. When you need to handle a complex task, you may have to make several searches. Need to better understand the user to better handle complexity.

Spivack: OK are we all just debating Google’s next feature? Or is there room for others?

Pell contends that many search engines [albeit not Google and Bing] are already working together.

Some discussion about the importance/desirability of including social and context info in search results — no discussion of privacy. All about how much better it will make search results.

Spivack comments on WA using expert curation, instead of community curation. Would love to hear more discussion on that point.

Now discussion on how does the engine know if they’ve answered you. And point made that many searches are refined over time…you search for info on getting a mortgage, you ask different things over time, and two months later you buy a house. At what point was your question “answered”?

The backchannel on this panel is pretty negative. I think it’s because there are too many people on the panel. And perhaps could have used a little more planning.


Powered by WordPress. Designed by WooThemes