PHP: Combined RSS and Atom Feed Reader
This is a long overdue upgrade to our original scripts for parsing RSS and Atom feeds from websites. While the old versions relied on the PHP XML library for parsing, our new FeedReader PHP class takes advantage of SimpleXML to traverse the document elements and namespaces of the RSS feed.
RSS/Atom Feed Reader Demonstration
Use the form below to select a feed and run it through our parser. Note that most feeds are free to access for personal use, but if you want to use them for a commercial site or application you might have to pay the provider.
RSS/Atom Feed Content
When an RSS or Atom feed has been loaded using the form above the contents will appear in this section. If nothing appears, or an error is displayed, there may be a problem with the feed URL or XML formatting.
Only the first five items from each RSS feed are being displayed.
PHP Source Code
Here is the complete source code for \Chirp\FeedParser:
<?PHP
namespace Chirp;
class FeedParser
{
// Original PHP code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
private $channel = [];
private $items = [];
public function __construct($file)
{
libxml_use_internal_errors(TRUE);
$xml = simplexml_load_file($file, NULL, LIBXML_NOCDATA);
if(FALSE === $xml) {
throw new \Exception(trim(libxml_get_errors()[0]->message));
libxml_clear_errors();
return FALSE;
}
$namespaces = $xml->getNameSpaces(TRUE);
if($xml->channel) {
$channel = $xml->channel;
} else {
$channel = $xml;
}
if($xml->channel->item) {
$items = $xml->channel->item;
} elseif($xml->item) {
$items = $xml->item;
} elseif($xml->entry) {
$items = $xml->entry;
};
foreach($channel->children() as $key => $val) {
switch($key)
{
case "item":
case "entry":
// these will be parsed below as items
continue 2;
case "link":
if($val->attributes()) {
if("self" == $val->attributes()->rel) {
// link rel="self" identified
$this->channel[$key] = (string) $val->attributes()->href;
}
} else {
$this->channel[$key] = (string) $val;
}
break;
default:
if(count($val) > 1) {
$this->channel[$key] = (array) $val;
} elseif($val) {
$this->channel[$key] = (string) $val;
}
}
}
if(isset($items) && $items) {
foreach($items as $item) {
$item_data = [];
foreach($item->children() as $key => $val) {
switch($key)
{
case "link":
if($val->attributes()) {
if("alternate" == $val->attributes()->rel) {
// link rel="alternate" identified
$item_data[$key] = (string) $val->attributes()->href;
}
} else {
$item_data[$key] = (string) $val;
}
break;
case "enclosure":
if($val->attributes()) {
foreach($val->attributes() as $key2 => $val2) {
$item_data[$key][$key2] = (string) $val2;
}
}
break;
default:
if($val->attributes()['type'] && ("xhtml" == $val->attributes()['type'])) {
$val = $val->children()->asXML();
}
if(count($val) > 1) {
$item_data[$key] = (array) $val;
} else {
$item_data[$key] = (string) $val;
}
}
}
if($namespaces) {
foreach($namespaces as $ns => $url) {
if(!$ns) continue;
foreach($item->attributes($ns, TRUE) as $key => $val) {
$item_data["{$ns}:{$key}"] = (string) $val;
}
foreach($item->children($url) as $key => $val) {
if($item->children($url)->{$key}->attributes()) {
foreach($item->children($url)->{$key}->attributes() as $key2 => $val2) {
$item_data["{$ns}:{$key}"][$key2] = (string) $val2;
}
}
if((string) $val) {
$item_data["{$ns}:{$key}"] = (string) $val;
}
}
}
} // for each namespace
$this->items[] = $item_data;
} // for each item in feed
}
}
private function finesseDate($str)
{
if(strtotime($str)) {
return date('j F Y', strtotime($str));
}
return $str;
}
// display a single image as HTML
private function display_image($arr, $caption = NULL)
{
$retval = "";
if(!isset($arr['url']) || !$arr['url']) {
return $retval;
}
if(isset($arr['link']) && $arr['link']) {
$retval .= "<a href=\"{$arr['link']}\" target=\"_blank\">";
}
$retval .= "<img class=\"feature-image\" src=\"{$arr['url']}\"";
if(isset($arr['width'], $arr['height']) && $arr['width'] && $arr['height']) {
$retval .= " width=\"{$arr['width']}\" height=\"{$arr['height']}T\"";
}
$retval .= " alt=\"" . htmlspecialchars($caption ?? $arr['title'] ?? "") . "\">";
if(isset($arr['link']) && $arr['link']) {
$retval .= "</a>";
}
return $retval;
}
// display a single channel as HTML
public function display_channel()
{
$retval = "";
$data = $this->channel;
if(!isset($data['title']) || !$data['title']) {
return $retval;
}
$retval .= "<h1>";
if(isset($data['link']) && $data['link']) {
$retval .= "<a href=\"" . htmlspecialchars($data['link']) . "\" target=\"_blank\">";
}
$retval .= stripslashes($data['title']);
if(isset($data['link']) && $data['link']) {
$retval .= "</a>";
}
if(isset($data['subtitle']) && $data['subtitle']) {
$retval .= "<br>\n<small>" . stripslashes($data['subtitle']) . "</small>";
}
$retval .= "</h1>\n";
if(isset($data['image']) && is_array($data['image']) && $data['image']) {
$retval .= "<p class=\"image\">" . $this->display_image($data['image']) . "</p>\n";
}
if(isset($data['description']) && $data['description']) {
$retval .= "<p>" . stripslashes($data['description']) . "</p>\n\n";
}
$tmp = [];
if(isset($data['updated']) && $data['updated']) {
$updated = $this->finesseDate($data['updated']);
$tmp[] = "Updated: {$updated}";
}
if(isset($data['copyright']) && $data['copyright']) {
$tmp[] = "Copyright: {$data['copyright']}";
}
if(isset($data['author']) && $data['author']) {
if(isset($data['author']['name']) && $data['author']['name']) {
$author_out = $data['author']['name'];
if(isset($data['author']['uri']) && $data['author']['uri']) {
$author_out = "<a href=\"{$data['author']['uri']}\">{$author_out}</a>";
}
$tmp[] = "Author: {$author_out}";
}
}
if($tmp) {
$retval .= "<p><small>" . implode("<br>\n", $tmp) . "</small></p>\n\n";
}
unset($tmp);
$retval .= "<div class=\"divider\"><!-- --></div>\n\n";
return $retval;
}
// display a single item as HTML
private function display_item($idx)
{
$retval = "";
if(!isset($this->items[$idx])) {
return $retval;
}
$item = $this->items[$idx];
if(!isset($item['link'])) {
if(isset($item['guid']) && $item['guid']) {
$item['link'] = $item['guid'];
} elseif(isset($item["rdf:about"])) {
$item['link'] = $item["rdf:about"];
}
}
if(!isset($item['updated'])) {
if(isset($item['pubDate']) && $item['pubDate']) {
$item['updated'] = $item['pubDate'];
} elseif(isset($item["dc:date"])) {
$item['updated'] = $item["dc:date"];
}
}
if(!isset($item['content'])) {
if(isset($item['content:encoded']) && $item['content:encoded']) {
$item['content'] = $item['content:encoded'];
} elseif(isset($item['description']) && $item['description']) {
$item['content'] = $item['description'];
}
}
$retval .= "<div class=\"title\">\n";
if(isset($item['media:thumbnail']) && is_array($item['media:thumbnail']) && $item['media:thumbnail']) {
$retval .= "<div class=\"thumb\">" . $this->display_image($item['media:thumbnail']) . "</div>\n";
}
$retval .= "<h3>";
if(isset($item['link']) && $item['link']) {
$retval .= "<a href=\"{$item['link']}\" target=\"_blank\">";
}
$retval .= stripslashes($item['title']);
if(isset($item['link']) && $item['link']) {
$retval .= "</a>";
}
$retval .= "</h3>\n";
if(isset($item['updated']) && $item['updated']) {
$item['updated'] = $this->finesseDate($item['updated']);
$retval .= " <span class=\"updated\">{$item['updated']}</span>";
}
$retval .= "</div>\n";
if(isset($item['media:content']) && is_array($item['media:content']) && $item['media:content']) {
if(!isset($item['media:content']['type']) || ("image" == $item['media:content']['type'])) {
$retval .= "<p>" . $this->display_image($item['media:content'], $item['media:description'] ?? "") . "</p>\n";
}
}
if(isset($item['enclosure']) && $item['enclosure']) {
$retval .= "<p class=\"enclosure\"><strong>Media:</strong> <a href=\"{$item['enclosure']['url']}\">";
$retval .= $item['enclosure']['type'];
$retval .= "</a>";
if(isset($item['enclosure']['length'])) {
$retval .= " (" . number_format($item['enclosure']['length'] / 1024, 1) . " kb)</small>";
}
$retval .= "</p>\n\n";
}
if(isset($item['content']) && $item['content']) {
$retval .= "<p>" . stripslashes($item['content']) . "</p>\n\n";
}
return $retval;
}
// display $num items from the feed
public function display_items($num = 5)
{
$retval = "";
for($idx=0; $idx < $num; $idx++) {
$retval .= $this->display_item($idx);
}
return $retval;
}
public function get_channel()
{
return $this->channel;
}
public function get_items($num = 5, $offset = 0)
{
return array_slice($this->items, $offset, $num);
}
}
As you can see, a lot of the heavy lifting is done in the constructor function which parses the file using SimpleXML and populates the channel and items local variables.
The various display_* methods then convert the stored array values into HTML which is returned for display. The two public methods used here other than the constructor are display_channel and display_items.
The main improvements over the previous code are:
- no longer reliant on the eval function;
- a single script to parse RSS, RSS 2.0 and Atom feeds;
- generic handling of namespace elements and attributes; and
- graceul handling of XML parse errors;
Sample Usage
Assuming you have an RSS or Atom XML file stored locally, you can parse and display the contents as HTML as follows:
<?PHP
try {
$parser = new \Chirp\FeedParser("/path/to/xmlfile.xml");
echo $parser->display_channel();
echo $parser->display_items(5);
} catch(\Exception $e) {
die("XML parse error: " . $e->getMessage());
}
?>
Some additional coding will be necessary if you have to first fetch and cache a remote file before parsing.
Depending on your PHP settings you may be able to just supply a URL to be fetched, but often this functionality has been disabled for security reasons.
If that is the case you will need something like our http_get_contents function.
Namespaces
One of the most painful aspects of XML is dealing with namespaces. In the case of RSS feeds you will find all kinds of prefixes embedded in the XML.
Our feed reader currently recognises a few tags in the rdf (RDF/XML), dc (Dublin Core) and media (Yahoo!) namespaces. Other elements and attributes are loaded by the parser, just not used by the display functions.
You can find some handy resources under References below.
References
- Stack Overflow: How to handle <![CDATA[ with SimpleXMLElement?
- Stack Overflow: How to get attribute of node with namespace using SimpleXML?
- Stack Overflow: Weird stuff when outputing XHTML using SimpleXML
- Get data from Journa TOCs
- Stack Overflow: Accessing date as XML node in PHP
Related Articles - Feed Readers
- PHP RSS Feed Reader Code Example
- PHP Combined RSS and Atom Feed Reader
- PHP Feed Reader with Ajax Updating
- PHP Displaying and updating RSS Content using Ajax
- PHP Atom Feed Reader: Source Code
- PHP RSS Feed Reader: Source Code
- PHP YouTube API Feed Reader: Source Code
- PHP RSS and Atom Feed Reader