June 25, 2009

Upcoming Changes to the API

In the next month or so, we will be making some significant changes to NPR.org. Some of these changes are visual, while others are architectural. As a result, there will likely be an impact on the API. That said, we did put in a lot of effort to make the system as backward compatible as possible to ensure that API users would be as minimally affected as possible.

I will post again to this blog soon with more details on the changes. In the meantime, here are some high-level descriptions of what to expect:

1. There will be changes to our topic structure resulting in some topics being eliminated, others being added, and others being renamed. For any changes to existing topics, there will be redirects to corresponding topics that will be maintained for a reasonable period of time to ensure backward compatibility. Our goal is to ensure that any applications that are dependent on specific topics existing will continue to work.

2. There will be some nodes and parameters added to the API output for NPRML. These will largely be to support the new features on NPR.org. They should not break any applications dependent on NPRML unless those applications require that these additional elements do not exist.

3. There will be new products and extensions added to the API. These will not adversely affect any current API calls.

As I mentioned earlier, I will publish to this blog again with more details on the changes as we draw closer. In the meantime, please provide any feedback or concerns about this in the comments section for this post.
--Daniel Jacobson

comments () | | e-mail

 
June 8, 2009

NPR's API Rights Management

One of the things that I am most commonly asked about regarding the NPR API is rights management. Because we are distributing content to unknown destinations, it is critical to make sure the API itself can control what gets offered and to whom. To handle these kinds of issues, we built a robust permissions and rights management system into the API. But that is not enough. Rights management starts with contracts and ensuring that the content is tagged appropriately. Without these steps, the rights management system cannot accurately withhold the content that is not allowed to be distributed. So, here is a breakdown of the steps we went through and the systems we built to handle rights in our API.

Contracts
Before launching the API, we spent a lot of time with our legal team reviewing existing contracts and our rights tagging system. Based on this review, we determined that a few changes needed to be made to the rights tagging system, but there were quite a few restrictions on what could be offered through the API. One interesting example is Fresh Air. Fresh Air is a program produced by WHYY and distributed on the radio by NPR. NPR is also responsible for displaying the content on NPR.org and is allowed to distributed Fresh Air content through limited outlets, like RSS, based on the terms of the contract. At the time of launch, however, NPR was not permitted to offer Fresh Air content through the API using the richer output formats. By the December 2008 upgrade to the API, however, the contract was renegotiated to include distribution through the API.

This highlights two points. First, at launch, we needed to incorporate a rights management system in the API that could identify specific types of content and then restrict that content from being distributed for certain types of users. The second key point is that NPR has been shifting our contract strategy to enable more content that we pick up to be distributable anywhere NPR content appears, including through the API.

Rights Tagging System
Our system for tagging assets not produced by NPR is critical for the success of rights management. That said, a sizable portion of this system involves manual effort. After all, it is the editorial process that chooses stories from external sources (e.g. AP, Reuters, etc.), images, videos and other assets. Upon selection of these assets, editorial staff then enter them into our content management system that contains appropriate fields for tagging the owner of the content.

Of course, we do have scripts that pull in some materials, like the AP Business feeds on our site. Those stories and assets that get pulled in through automated systems also get tagged by the scripts.

Finally, we also have scripts to remove content from our system based on contractual obligations. For example, if we have the rights to present an image for only 30 days, these scripts will purge the system of that image and its metadata at the appropriate time.

Rights Management System
After we determine what we are allowed to do based on the contracts, and after appropriately tagging the content itself, we were able to create a pretty flexible and powerful system for managing the distribution of the content through the API. This system has four aspects to it, including query-level filtering, story-level filtering, asset-level filtering and user permissions.

Query-level filtering enables the system to remove any story or list (ie. topic, program, series, etc.) from the system due to the permissions. It does this in two ways. First, the system will analyze the API query for any IDs that the user does not have permissions to access. If, for example, the user does not have the rights to view content from This I Believe and the user has included id=4538138 in their API query, the story-level filtering will remove the ID from the query and will proceed to execute the query without it.

Once a valid query passes through the system and figures out what stories to return, the story-level filter gets applied. This filter determines which individual stories need to be removed before returning the feed back to the user. This is done by applying the list of IDs in the filter, for the user's access level, as exclusions in the query to the API. The list of IDs in the filter include list IDs (eg. topics, programs, series, etc.), so the same rule applies to any stories that belong to any of these lists. For example, we have already established that my API key does not give me permissions to see stories that belong to This I Believe. If I request the top 10 stories that belong to the Opinion topic, and if the third story is a This I Believe story, then the system will eliminate the the third story and will add the eleventh to the results to accommodate my request for 10 stories.

Asset-level filtering is less stringent that story-level filtering in that it does not remove the story completely (as in the example above). Rather, it will display the story, but will only return those assets that the user has the rights to see. For example, if I request the top 10 stories from the People & Places topic, that result set may include a story from Fresh Air and This I Believe. In this case, let's say story number three is still a This I Believe story and story number seven is a Fresh Air story. We have already established that my API key does not allow me to see This I Believe, so the story-level filter will remove the third story and will include the eleventh in my results. Meanwhile, my API key allows me to see Fresh Air stories, just not all of them (any such restriction is no longer the case, but when we first launched the API, Fresh Air was only available through RSS). As a result, the seventh story will get through the story-level filter, but the asset-level filter will remove all assets other than the RSS information. We have other asset-level filters for audio, images, video, full text, etc.

The final element of this system, which has been mentioned throughout, is permissions. Our permission levels include Public, Partner, Station, NPR.org and Master, with increasing level of access in that order. For each level, there is a distinct list of IDs associated with each filter type (although the query and story filter lists are always the same). As a result, the same story in our system can theoretically be removed for the Public user, only have RSS content for Partner users, have everything but images for Stations, and be fully available to the NPR.org users. Meanwhile, a different story can theoretically have a completely different permission scheme enabling NPR.org users no access to it while public users can see it all.

To see how this filtering layer sits on top of our system, here is an architectural diagram:



Click here to enlarge

Ongoing Challenges
Although this system handles our cases for the most part, rights filtering is and will always be a challenge. There are certainly cases that could sneak through the system. These cases could be a result of the editorial process, the tagging tools or the code in the API. We also encounter new scenarios that sometimes require us to quickly modify the API to handle them. Despite these challenges, we have been pretty happy with this system so far.

--Daniel Jacobson

comments () | | e-mail

 
April 6, 2009

New API Feature : Create Your Own XML Output

Today, we added two updates to the API, as follows:

XML Field Remap
This new functionality allows you to modify our NPRML elements to whatever you want, so your API requests can fit your existing applications without you having to change your code. The remap function allows any node or any attribute to be renamed and it can apply to any number of elements in the document. And again, this only applies to the NPRML output. To see how it works, go to the API Input Reference. In the meantime, here are some examples of how to modify the API query string to implement the remap:

- To change the list element, use "remap=list:newList", which will rename the list node to "newList".

- To change a sub-element of list, use "remap="list.title:newListTitle", which will rename the title node under list to "newListTitle".

- To change the story element, use "remap=list.story:newStory", which will rename the story node to "newStory".

- To change a sub-element of story, use "remap=list.story.title:newStoryTitle", which will rename the title node under story to "newStoryTitle".

- To change a attribute for any element in the NPRML output (even if the node itself was changed), use "remap=story~id:newStoryId", which will rename the id attribute for the story node to "newStoryId".

- To apply many of these changes in a single query, use the comma to separate the remap commands, as follows: "remap=list:newList,story:newStory,story.teaser:newTeaser,
story~id:newStoryId,list.story.text.paragraph:textParagraph".

Most Emailed Feed
We also opened up the Most Emailed list through the API. Previously, it was only available as an RSS feed, but now, it can be accessed through the API, including access to full text, audio, images, and other assets that NPR has the rights to redistribute. There are a few limitations in the feed, however, that are not present in any of our other existing options in the API. For example, this feed cannot be mashed-up with any other feeds from the API, it cannot be sorted, and the queries cannot be restricted by date or search term. As a result of these limitations, the Most Emailed feed is also not present in the Query Generator.

To acces the Most Emailed feed, add "id=100" to the API query string.
--Daniel Jacobson

comments () | | e-mail

 
March 19, 2009

API Update : "Day to Day" and "News & Notes" To Be Retired

NPR's programs "Day to Day" and "News & Notes" will be broadcasting their final shows on Friday, March 20, 2009. Although the programs will no longer be producing new shows, the entire archive for both of these shows will still be available on NPR.org and through the API. Eventually, we will likely remove these programs from the API Query Generator, although their IDs will still be valid in the API and can be found in the API Mapping Index.

API queries that use these program IDs will not return any new content after this Friday. The IDs, however, will remain valid, so your applications should continue to work as expected.

Finally, to access the full archives of these programs in the API, you can use the functions available in the Control tab of the Query Generator. These functions include searches based on search terms and date ranges and allow for the ability to paginate through the results.

Please let us know if anything unexpected happens as a result of this change.
--Daniel Jacobson

comments () | | e-mail

 
March 11, 2009

Seeking Feedback for SXSW API Session

As mentioned in Zach's previous post, I will be part of a panel at SXSW. The panel discussion will be on APIs, is called "Get Me Rewrite! Developing APIs and the Changing Face of News", and is on Sunday at 3:30pm. For more information on the panel, go to the SXSW page for this panel.

The panel moderator is Jacob Harris, from The New York Times. Joining the discussion will be Brad Stenger from Wired, and John Donovan from Daylife.

We will have a substantial time set aside for Q&A although prior to the Q&A we will be addressing many of the challenges in producing and maintaining APIs. That said, there are myriad things we can focus on when discussing APIs...

So, please let us know what is most on your mind. What kinds of questions do you want this panel to answer? Are you interested in technical background, business goals, legal issues, getting corporate buy-in, the marketplace for APIs, etc.? We will be using this feedback to refine our topics accordingly as we finish preparing for the session.

--Daniel Jacobson

comments () | | e-mail

 
January 7, 2009

Tips and Tricks for Mix Your Own Podcast

We've had a positive reception to the Mix Your Own Podcast tool launched December 18. Here are a few tips to help you get more out of this new feature.

Every Story is an Episode

Our traditional podcasts, launched in August 2005, often combine multiple stories in a single podcast episode. For example, the Economy podcast has episodes that typically contain 4 stories, delivered on Tuesday and Friday. With Mix Your Own Podcast, each story appears as its own episode. Here is a Mix Your Own version of the Economy podcast. This allows you to download the stories as soon as the audio is available on NPR.org, and it gives you more control over what you want to listen to.

However, if you set up a podcast on a popular topic, you may get several episodes per day, so you may want to adjust your podcast software to keep more episodes available. In iTunes, this is done by selecting the Podcast Tab and then clicking the Settings button on the lower left. You may also want to set your software to download episodes more frequently so that you get timely news as soon as it is available. Here are some suggested settings.

Click to enlarge

 

Refined Search

Mix Your Own Podcast finds stories relevant to your interests in one of two ways. First, NPR categorizes stories in many different ways: the program on which the story was aired/published, topics associated with the story, the reporters of the story, musical artists featured in the story, and so on. You can use any of these pre-existing categories to build your podcast. In the Mix Your Own Podcast tool, pre-existing categories will appear as you type in the keyword field. You can select these categories by clicking on them.

Mix Your Own Podcast drop down

 

Second, your podcast can be based on free text searches of the content of stories. Originally, this search was done on any text content found on the web page for the story as well as the audio transcripts for the stories (if available). While comprehensive, this can find stories that are only tangentially related to your keywords. For example, if you entered "Cat" as your keyword, your podcast could include stories where a reporter used the phrase "Let the cat out of the bag." So, we have changed the way text search is used in Mix Your Own Podcast; now, we will only search the title and the summary of the story. This should provide more relevant stories for your podcast. This change took place automatically, so you don't have to make any changes to your podcast to take advantage of it. However, if you liked the full text search, see the next tip.

Mix Tool for Power Users

You can still use the full text version of search to build your podcast via the API Query Generator. Mix Your Own Podcast is built on top of the NPR API. Using the Query Generator, you can fine tune the criteria used to pick stories for your podcast. To use the Query Generator, you will need to sign up for a free API Key. Then, in the Query Generator, go to the "Fields" tab and select "Podcast" as your "Output Format". You can then use the other tabs to customize your podcast to your heart's content.

Click to enlarge

For example, if you preferred the full text search option for building your podcast, go to the "Control" tab, type in your search terms, and select "Full Content of Story" as the "Search Type".

Another example of what you can do with the Query Generator is controlling how your selection criteria are combined. In the Mix Your Own Podcast tool, we return stories that match any of your specified criteria. If you enter several categories, the podcast will contain stories that match at least one of the criteria. In technical terms, we call this a "Boolean Or" API query. Perhaps, though, you want to combine your criteria to get a more focused podcast that contains only the stories that match all of the category selections you have made. For example, if I wanted a podcast that contained only stories that were about both Technology and Politics, I would go to the Query Generator "Topics" tab, check both the "Technology" and "Politics" options, and then go to the "Control" tab and select the "And" option for "Boolean for IDs" option.

Click to enlarge

The end result is my Techlogy and Politics custom podcast.

We would like to hear how you are using the Mix Your Own Podcast tool. If you have created an interesting custom podcast, please post the URL in the comments section of this post.

--Harold Neal

comments () | | e-mail

 
December 18, 2008

API Upgrade : Mix Your Own Podcast and Other New Features/Content

Today we have some exciting new API enhancements to share with you, including Mix Your Own Podcast, a new extension that offers users an infinite number of ways to customize NPR podcasts. Here are more details about Mix Your Own Podcast as well as some of the other features and content that we launched:

Mix Your Own Podcast
Prior to this release, the API offered only streaming formats of our audio content, including Windows Media, Real Audio, and progressive download MP3. These formats were supported by a Terms of Use that required API users to stream the audio from our servers, preventing them from downloading the audio. With today's launch, however, the API now allows users to slice through the NPR.org archive to create custom podcast feeds based on virtually any aggregation (or combination of aggregations) in the API. To learn more about this, go to the NPR Podcast Directory.
Due to various current constraints, the only real exception here is that users will not be allowed to create full-show podcasts of Morning Edition, All Things Considered, Weekend Edition Saturday or Weekend Edition Sunday. However, all stories from these and other programs will be available to create any other podcast mashup in the system.

Station Finder API. With this release, we are also offering access to our Station Finder API. This API will allow users to pass in zip codes, city/state, station call letters or latitude/longitude information, and we will return a list of stations that can be heard in that location. The station results also include key information about the stations, including links to their home page, schedule page, audio streams, RSS feeds, podcasts, station logo and more. Because the system also has station stories from some of these stations (and more of this content will become available in the coming months), you will be able to, for example, search for a zip code, identify the stations in that zip code, then find all of the stories from all of the stations returned. Over the coming months, more station content will be made available through the API.

New Content: Fresh Air and StoryCorps. With this release, we are also making available the full archive of Fresh Air and StoryCorps. For Fresh Air, we will be explosing over 10,000 stories (and counting) dating back to 1993. The StoryCorps offering will include about 200 stories (and counting) dating back to 2005.

Query By Asset Type
Now you can query the API to get stories that contain a particular type of asset. For example, you can filter your query to only get stories that contain images (useful if you are building a slideshow application, for example), or stories with audio, or stories with long-form text. To use this new feature, append &requiredAssets=image to your query string and you will get only stories with images. The other allowed values for this parameter are audio and text. You can combine these filters with a comma-delimited string (&requiredAssets=image,text,audio). This new feature will be added to the documentation and the Query Generator in the next week or so. This feature does not work yet with API queries based on free-text search.

We are excited about this new release and view it as the next step in our continued effort to open up our content to the world.
--Daniel Jacobson

comments () | | e-mail

 
December 8, 2008

API Usage

As mentioned in my previous post about metrics, we have identified quite a few different usages of the API. These implementations range from incorporating NPR stories on member stations' web sites to widgets created by developers in the public. Below are some of the more interesting or comprehensive uses that we have found.

NPR Member Station Implementations

Minnesota Public Radio Program Archives

North Country Public Radio

Oregon Public Broadcasting

KGOU

SouthEast Public Radio

WAMC

Hearing Voices Widget

KJZZ - NPR Simile Timeline


Public User Websites, Widgets, and Applications

Reverbiage Widget

Axiom Stack iPhone Site

KDE Desktop NPR Audio Player

NPR Backstory Twitter Mashup

RubyNPR - A code wrapper in Ruby

All Tweets Considered

NPR Song of the Day Widget for Mac OSX Dashboard

NPR Audio Search Box FireFox Plug-In

If you have created something using the API and it is not included in this list, please let us know about it by adding it in the comments of this post.
--Daniel Jacobson

comments () | | e-mail

 
November 24, 2008

API Decisions : Metrics

When we launched the API back in July, we had some ideas as to how to gauge success from a metrics perspective. Some of those success measures were around adoption by member stations, others we based on total number of registrants, and others were based on number of requests. That said, having one of the first comprehensive content APIs, it was hard to determine what the actual numbers meant. In our first few weeks, we had over 300 registrants. Was that good? We think so, but it is hard to know. We know that many of those registrants were member stations, many were developers in the public, and some percentage were people who registered simply to take a look at what they just read about in an article somewhere. After one month, we exceeded 1,000,000 requests to the API itself. We were pretty confident that number was a good one, but again, we had no real basis of comparison.

Despite the challenges in figuring out what our numbers mean, we do believe that our usage and registration numbers (published most recently two weeks ago in my last post) are a strong indication of success for the API.

Another challenge is how to actually get our metrics. While our goal is to encourage the re-use of our content, we obviously want some way to measure success. There are several key ways that we have baked into the system to allow us to see how the API is being used. Keep in mind that there is no 100% way to know how many eyes are seeing the content, only how people are implementing it, and in some cases, on which websites, blogs or applications people are seeing the content that came from the API. The primary methods are as follows:

* Since all audio must be served from NPR servers (based on our Terms of Use), we are able to tag the audio accordingly, indicating that the request originated from the API.

* All requests to the API require an access key. This helps us identify trends in usage of the API at the key level, in addition to at much higher levels.

* For each request in the system, we will be outputting a log to our servers that includes the request, the API key used in the request, and the stories/assets that were returned. Over time, we will be able to see trends of use, most popular requests, most commonly distributed stories, etc.

* For any rich-content request to the API (ie. text elements that contain HTML), we have included a 1x1 pixel image that is served from NPR servers (which is an industry standard approach for capturing metrics online) and passes information back to our logs. This will help us identify some of the places where NPR content is appearing when it has been cached by the website, blog or application.

Like I said, this is not the complete picture, but these approaches result in metrics that do give us a good indication as to how the API is getting used and by whom. With that in mind, these numbers only have weight if they translate into real-world consumption of the content. In my next post I will highlight some of the more interesting implementations and usages that we have heard about in the marketplace.
-- Daniel Jacobson

comments () | | e-mail

 
November 10, 2008

NPR's Open Content Strategy

It has been several weeks since my last post on the goals and challenges of launching NPR's API. I still intend to fill out the story in the coming weeks/months.

I will start up again by talking about my recent presentation at Mashery's API Conference last week. The conference itself was primarily focused on the business of APIs. In my presentation, I mainly discussed NPR's goals for opening up an API along with some of the challenges we faced leading up to the launch.

As NPR reviewed the landscape of content syndication, we found that there were quite a few APIs already in the marketplace. Most of them, however, belong to content aggregators (eg. Google, Yahoo!, etc.), user-generated content sites (eg. Flickr, Wikipedia, etc.), and some e-commerce sites (eg. eBay, Amazon, etc.). There were surprisingly few comprehensive APIs from major media organizations. Some organizations, like DayLife, CBS and BBC, offered APIs, but these limited in a variety of ways.

Mostly, these major media organizations were syndicating their content through RSS or extended RSS, such as Podcasts or MediaRSS. This approach has been surprisingly effective - what I call "Really Successful Syndication". It is successful because RSS is simple, widely adopted in the marketplace, and succeeds in driving traffic back to the site. The major problems with RSS are the same things that make it really successful. That is, in the current marketplace, RSS now stands for "Really Stingy Syndication" because it does not contain very much real content. Instead, it provides enough content to drive traffic back to the source, embracing the "lock-down" model of content.

The marketplace is changing dramatically, though, and people have destinations to which they are attached. They go to Facebook, MySpace, etc. and expect to find content there. Content providers will have to put their content on these sites through widgets and other means of distribution. If the users of Facebook, for example, find the content they want on Facebook, then they are less likely to leave Facebook to get more content (unless the user has a keen interest in a specific content provider). As a result, the richer the content is on Facebook, the more likely the user identifies your brand as a trusted news source. So, RSS is ok only if no other providers offer richer content. But it is only a matter of time before the richer content is there...

Because of these changes in the marketplace, NPR decided to release a comprehensive API of all of our content that we have rights to redistribute. If our content is truly open, it will enable users to mash it up, keep it relevant to them, and share it with new audiences in places where those people are. Although NPR.org is still critical to our strategy, we can no longer rely exclusively on the site as a way to reach people.

There were two other major factors in our decision. First, it is critically important for NPR to provide content and services to our Member stations. The API will enable stations to get NPR content on their sites. We also plan to offer local station content through the API, which will provide a local/national view of content to the users. The second major influence in our decision was NPR's Mission to "create a more informed public". By offering both local and national content in our API, enabling users to mash it up and use it in ways that we have not thought of or don't have the resources to execute, we hope to reach and inform new audiences.

Once we decided to release an API, there were several questions that we needed to answer. First and foremost, we needed to establish what our target audiences for the API would be. They are as follows:

  • End-users and other web developers (These users can post content to blogs as well as create innovative ways of using NPR content)
  • NPR's Digital Media team (NPR Product and Project Managers can improve their products using the API without a lot of effort from NPR Developers)
  • NPR Member Stations
  • Content aggregators and NPR's business partners

Serving each of these audiences through the API enables us to seamlessly integrate with them in such a way that it requires very little involvement from NPR's development staff.

In the slides (attached below) from the conference, I have provided some examples of how these audiences are using the API.


We will be discussing more of our challenges in later posts.
-- Daniel Jacobson

comments () | | e-mail

 
November 3, 2008

NPR Roadshow

While we have been pretty busy building tools for our Election Night reporting, we continue working on the API. The feedback so far has been fantastic. Along with encouragement and congratulations we have received lots great suggestions. We have been very excited by the adoption of this technology and the general embracing of this "Brand and Release" strategy. We hope to have some significant and exciting new features in place by early next year.

But what if you want to hear more...?

Well if you missed us present at OSCON 08 there will be other opportunities to hear us first hand discuss what we have done, and where we are going with the API.

Here are several of the upcoming events we plan to be at:

Today (11/03) at 5:15pm PST Daniel Jacobson will be discussing our efforts on the API at The Business of APIs Conference. If you are attending please stop by.

For those in the Public Broadcasting family, we will be at IMA Public Media 09 in Atlanta Feb 19-21. This is definitely a must attend for those in public broadcasting who see their future world meshing traditional and new media experiences.

We are also very excited to be a finalist for the We Media Game changer award. Out of 150 Nominees we are one of 35 finalist. Additionally we could be chosen as keynote speaker based on community votes.

And, finally we recently got the word from the folks at O'Reilly that we have been invited to present at the Web 2.0 Expo Mar 31st-Apr. 3rd.

Hope to see you soon.

-- Zach Brand

comments () | | e-mail

 
September 18, 2008

API Decisions : Why Did We Create It?

As promised, I wanted to give some history about how we ended up creating the NPR API. The first major decision that we were faced with was whether or not we should open up our API. The decision was not whether or not to build it, as we'd already done that. Back in November, 2007, we built the foundation of the API to launch with NPR Music. This is basically an XML file repository (essentially in an extended NPRML format) that contains all data needed to build pages on NPR.org. In addition to the XML repository, it includes a PHP framework used to render the XML files to the appropriate presentation layer (these layers include NPR.org as well as RSS feeds, podcast feeds, mobile sites and other outputs that we serve). Here is a diagram of the architecture which includes all of the caching layers as well, some of which were incorporated with the actual release of the public API:

Click image to enlarge

There are several reasons for this architectural approach:

1. PERFORMANCE : Requests will first go through the Memcache and file cache layers, which will always be the most efficient. If the requested document is not in Memcache, we have PHP render the output using the XML files. If the XML file cannot be obtained, PHP will access the database for the data. If PHP hits the database, however, a version of the request will be stored back in Memcache to speed up the delivery of the next request. This ultimately takes strain off of the database, which is the most expensive operation in serving documents.

2. ABSTRACTION : Creating a separate layer between the various presentations and the actual database allows the presentation layers to be agnostic with respect to the data repository. Currently, our database is Oracle, but if want to move to MySQL, then the presentation layers don't really care because they are served primarily off of the XML repository (although the final fail-over to the database would require changes).

3. SIMPLIFICATION : The database itself is a complicated relational system. The schema is largely normalized for scalability and efficiency in our write operations. Building pages, as a result, requires expensive table joins across very tall tables. These queries, although tuned, add up when you consider how many queries there are throughout a story page, for example. Executing these queries once and storing the data in a flatter file system enables the pages to be built more efficiently (both because of the flatter model as well as not having to access the database).

4. SCALABILITY : Because of the rendering framework, we are able to easily add new transformation and presentation layers without having to write a lot of extra code or customized database queries. The rendering engine knows how to handle the XML files in a cohesive way because they are relatively flat, so the transformation layers really aren't that different from each other. The framework also allows for reuse of code in the presentation layers because most of the presentations are dealing with the same content and are displaying that content in similar ways. New presentations for NPR.org are the hardest because of all of the design nuances, but adding Atom and MediaRSS are pretty quick and painless. The difficult part is figuring out how to map our fields to those structures, not in the coding of it.

So, the system was largely in place almost a year ago, alleviating many of the technical hurdles in building an API. We knew that if we wanted to open the API up to the world we would still have some technical challenges left, including filtering engines, the registration engine, the query generator, etc. Before getting to those tasks, however, we needed to determine if the public API fits with the overall NPR strategy.

-- Daniel Jacobson

comments () | | e-mail

 
September 12, 2008

API Decisions : Introduction

Over the coming weeks, my colleagues and I will blog about the various decisions that we made while developing the API. The posts will discuss the following topics:

* Output formats
* OpenID
* Query generator
* Caching layer and performance
* Number of requests per user per day
* Audio stream vs. Download
* Amount and type of content offered
* Terms of use
* Rights
* Metrics
* Station content
* The archive and the deep NPR archive

I am sure that during the course of this series other topics will be added, but these capture some of the more prominent issues that were discussed. As you can see, these topics involve technical issues as well as legal and business ones.

Before we can get to any of the above topics, though, we have to address the single most important decision that we made: Should we open up the API?. That will be the first post in this series.

The purpose of this series is to continue to be as transparent as we can be and to be an active, engaging part of the technical community. We hope that some of these decisions that we dealt with will help others successfully pursue creating APIs as well. We also hope that this blog will act as a forum to continue the discussion and will help us continue to better deliver useful tools and services.

I am looking forward to the discussion!
--Daniel Jacobson

comments () | | e-mail

 
August 28, 2008

JSON and the Argot-nauts

This is the first of a series of posts that will discuss decisions we made in the design, architecture, and implementation of the API. We hope that our experiences will be useful to you when working with APIs and similar software projects. We also want to hear from you--what you like, what you think should be changed--so we can make course corrections as the API evolves. So put your software geek hats on and let's talk code.

My favorite way to consume the API is using JSON. With just a few lines of code, I get a data object that I can use with JavaScript--no messy parsing of XML or the DOM necessary. The structure of this JSON data object strongly resembles the structure of the NPRML XML output document. In fact, to create the JSON output, we first generate the NPRML document, and then do some transformations to create the JSON output.

However, XML does not map to JSON seamlessly. The XML in NPRML has element nodes that contain either other element nodes or textual content. The element nodes may also have attributes. It is common practice to map element nodes to objects in JSON, with each sub-element becoming a nested object. However, we had to decide on how to treat textual content and attributes.

It makes sense to make the textual content be a property of the object that contains it, but we need a name for that property. We looked at other APIs for a standard naming convention, but there doesn't appear to be one at this time. For example, Google Data APIs puts textual content in a property named $t. The Flickr API uses a property named _content. In the NPR API, we use a property named $text.

Some APIs take a different approach, treating text nodes as string properties of the object, which means the name of the property is the element node name. Yahoo! Shopping Web Services take this approach. This makes the JSON more readable and simpler, but it doesn't work if nodes with textual content also have attributes.

We map element attributes to object properties. This approach is used by many APIs, although some (such as Yahoo! Shopping) create a specially named nested object to hold all of the attribute values. With our approach, this NPRML fragment:

<show>

    <program id="2" code="ATC">All Things Considered</program>

        <showDate>Fri, 22 Aug 2008 16:00:00 -0400</showDate>

    <segNum>12</segNum>

</show>

gets mapped to this JSON:

"show": [{

    "program": {

        "id": "2",

        "code": "ATC",

        "$text": "All Things Considered"

    },

    "showDate": {

        "$text": "Fri, 22 Aug 2008 16:00:00 -0400"

    },

    "segNum": {

        "$text": "12"

    }

}]

Note that the show property contains an array. It is possible that a story was used in multiple shows. We use arrays for properties that could have more than one value. This is done even when a given story has only one value for the property.

We are interested on hearing what you think is the best approach to JSON. Have you seen other approaches that work better? Is JSON important to you? Let us know in comments.

--Harold Neal

comments () | | e-mail

 
August 20, 2008

OSCON Presentation on the NPR API

Shortly after the launch of the API, Harold Neal and I presented it at O'Reilly's Open Source Convention (OSCON) on July 24th. Here is a copy of that presentation (requires Adobe Acrobat). This version of the presentation has been slightly modified to reflect more current data (particularly around usage of the API) as well as some other changes that will help the presentation live as a standalone document. I have also added screen shots of the Query Generator to represent the live demo of the API that we did during the presentation.

Sharing this presentation in this forum is the first step to making our process, architecture and decisions around the API more transparent and open to our users. There will be other documents and blog posts to follow with more information. Let us know if you have specific questions about our process so we can try to address them in these future posts.



Click here to view the presentation ( (requires Adobe Acrobat)


Continue reading "OSCON Presentation on the NPR API" »

comments () | | e-mail

 
August 11, 2008

Suggestions for the Next Version of NPR's API?

It has been almost a month since we launched our API and we are now preparing requirements for our second release. What would you most like to see in the next version? Are there specific fields or standard formats that you would like us to output? Are there topics or other ways of slicing the data that you would like represented?
- Daniel Jacobson

comments () | | e-mail

 
July 21, 2008

Proposing Questions for an API FAQ

First, thanks to everybody for their API-related comments here and numerous other places. We are a bit overdue, but are working on putting up an FAQ for the API. As we have started to compile a list of questions, a common answer is emerging: We didn't want to hold the API back until everything possible was perfect. We do think the API today is very extensive and fills a void, but we also think that it will evolve as time allows, and as we respond to requests and new opportunities. As with everything else, we like to treat all our online efforts as an ongoing work-in-progress, with opportunities to get even better. But for the moment, we're very excited to see what ideas folks implement with it.

I've started a list of questions below. Please chime in with comments on what other questions you'd like see included in the API FAQ.

Continue reading "Proposing Questions for an API FAQ" »

comments () | | e-mail

 
July 17, 2008

API Rights and NPRML

There have been quite a few comments and posts around the Web about our API and I would like to clarify a few points about the offering. I also plan to engage in some of the discussions in other forums but I wanted to address them first in our own blog. To see some of the more prominent discussions, you can see the articles on TechCrunch and on Mashable.com.

A common discussion point on the API so far has been our exclusions. Below are the reasons for the exclusions referenced in both of the above blogs as well as some other details that I want to explain:

  • NPR programs and series, including Fresh Air, This I Believe and StoryCorps, are getting excluded due to rights restrictions. We obviously would like to include these in the API and are looking into making it happen. That said, we did not want to hold up the launch of the API as we researched the rights.
  • NPR programs, including RadioLabs, Car Talk and The Diane Rehm Show are distributed by NPR but their web content is not. As a result, these programs are currently not available on NPR.org or through the API.
  • Other radio programs, including MarketPlace, This American Life and A Prairie Home Companion, are not NPR programs -- they are produced and distributed by other public radio entities like American Public Media or Public Radio Interactive. NPR does not have the access or the rights to distribute the content from those programs.
  • Currently, we are not providing any of our video content in the API, although it is on our future plans. Our goal was to launch with our primary asset well defined, which is audio. There are still a few details that we need to work out before extending the API to offer our video content, but hope to be opening that up soon.
  • Our online database goes back to 1995, including over 250,000 stories spanning 13 years. We are actively working to get more of the archival content, dating back to 1970, into the system and available through the API.
  • NPRML is the XML structure that is native to our entire system and it is the structure that drives all content for NPR.org, the API and beyond. We decided to open it up just to be transparent with as much content as possible. This structure is not meant to be a new proposed standard or to replace our goals to expand our output formats. We do intend to include other more comprehensive formats like NewsML and others in the future.

Although we believe that our API is an extensive offering, it will only continue to grow with time. We really appreciate the feedback we have been getting and will look forward to getting more in the future. Knowing that there is a desire for video, for example, will help us prioritize accordingly to better serve the API community. Please check back to this blog for more information about our API and our future plans.

-- Daniel Jacobson

comments () | | e-mail

 
July 16, 2008

NPR API is Live on NPR.org

As referenced in yesterday's post, we launched our new API today. To find the API, you can either go directly to http://www.npr.org/api/ or you can follow the new link called "Tools / API" on the NPR.org left nav under the Services section.

In order to use the API, you will need to register using our new registration engine that Zach mentioned in a previous post. Once registered, you will need to generate an apiKey by clicking the Generate Key button on the API tab of your account profile. The apiKey is used to authenticate all requests to the API. After you get your apiKey, you can read our documentation or just go straight to the Query Generator, which is a comprehensive tool that allows you to easily create your API requests and see what your results would look like.

There were quite a few questions that we addressed when developing the API, but one thing that was not really in question was the need to open as much of our content as possible. As a result, almost everything that you can find on NPR.org that we have the rights to redistribute is available through the API. This includes audio, images, full text, etc. That said, there are elements, series and programs that we could not offer due to rights restrictions.

We also discussed in depth which output formats we would support. For launch, we are supporting RSS, MediaRSS, Atom, JSON, JavaScript Widgets, HTML Widgets and our custom tagging structure called NPRML. We would like feedback on what other formats we should support, although as of now we are planning to extend it to include NewsML. Which of the existing formats are you most likely to use from our API?

There were a ton of contributors to this new API with the primary technical architect being Harold Neal. Other major contributors include Joanne Garlow, Jason Grosman, Tony Yan, Ivan Lazarte, Stephanie Oura, Ben Hands, Shain Miley, Lindsay Mangum, Sugirtha Solai, Todd Welstein and Vida Logan, and others.

Finally, we would really like to get as much feedback from the community on the API, particularly on what you think you will use and what is missing from the offering. We will continue to post here with more thoughts and questions.

-- Daniel Jacobson

comments () | | e-mail

 
July 15, 2008

Coming Soon: Our New API

In the next couple of days, NPR.org will be launching our new API, which will be an open and extensive way for our users to share and mash-up our content. Once live, we will be adding a new link on the NPR.org left nav in the Services section called "Tools / API". We are very excited about this new tool and are looking forward to the inventive ways that you will use our content! After all, there are only a few of us but millions of you...

As part of the launch, we will also be showcasing several widgets and applications that were built using the API. All of these will be found on our upcoming widgets page, which will launch with the API. Among them is a widget that maps NPR stories based on Geoff Gaudreault's Reverbiage site, and an iPhone site built by our friends at Axiom Stack.


I will post again on the day of the launch to let you all know when it is live. We will also continue to post to this blog to solicit feedback on the API.

-- Daniel Jacobson

comments () | | e-mail

 


   
   
   
null


 

About Us

Ever wanted to peer under the hood and learn about the inner workings of the NPR website? Have we got a blog for you, then. Here at Inside NPR.org, the NPR Digital Media team will keep you up-to-date on digital products and services we're developing, including social networking tools and our media player. For more info, please see our FAQ and our discussion rules.

 
 

Search Inside NPR.org

 
 

Contact us

Got a question or comment you want to send to us privately? Use our contact form.

 
 
 

Browse Topics

Services

Programs