PHP: RSS Feed Reader: Source Code
This page presents a simple class with a constructor and two public functions: getOutput returns an HTML-formatted version of the RSS feed, while getRawOutput returns all the attributes in a single multi-level array.
<?PHP
// where is the feed located?
$url = "https://www.the-art-of-web.com/rss.xml";
// create object to hold data and display output
$rss_parser = new \Chirp\RSSParser($url);
$output = $rss_parser->getOutput(); // returns string containing HTML
echo $output;
?>
Yes, it really can be that simple.
Source code of rssparser.php
This class is by no means the be-all and end-all of RSS parsing. It's designed to be simple, functional and easily customisable. It appears to work for all RSS formats, and can be extended to handle new formats - or perhaps further to handle general XML parsing.
File: rssparser.php
<?PHP
namespace Chirp;
// Original PHP code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
class RSSParser
{
// keeps track of current and preceding elements
var $tags = [];
// array containing all feed data
var $output = [];
// return value for display functions
var $retval = "";
var $errorlevel = 0;
// constructor for new object
public function __construct($file)
{
$errorlevel = error_reporting();
error_reporting($errorlevel & ~E_NOTICE);
// instantiate xml-parser and assign event handlers
$xml_parser = xml_parser_create("");
xml_set_object($xml_parser, $this);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "parseData");
$curl_opts = [
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_COOKIEFILE => "/tmp/cookies-file.tmp",
];
// open file for reading and send data to xml-parser
$data = preg_match("/^http/", $file) ? CurlTools::http_get_contents($file, $curl_opts) : file_get_contents($file);
xml_parse($xml_parser, $data) or die(
sprintf(get_class() . ": Error <b>%s</b> at line <b>%d</b><br>",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser))
);
// dismiss xml parser
xml_parser_free($xml_parser);
error_reporting($errorlevel);
}
private function startElement($parser, $tagname, $attrs = [])
{
// RSS 2.0 - ENCLOSURE
if($tagname == "ENCLOSURE" && $attrs) {
$this->startElement($parser, "ENCLOSURE");
foreach($attrs as $attr => $attrval) {
$this->startElement($parser, $attr);
$this->parseData($parser, $attrval);
$this->endElement($parser, $attr);
}
$this->endElement($parser, "ENCLOSURE");
}
// Yahoo! Media RSS - images
if(($tagname == "MEDIA:CONTENT") && isset($attr['URL']) && $attrs['URL'] && ($attrs['MEDIUM'] == "image")) {
$this->startElement($parser, "IMAGE");
$this->parseData($parser, $attrs['URL']);
$this->endElement($parser, "IMAGE");
}
// check if this element can contain others - list may be edited
if(preg_match("/^(RDF|RSS|CHANNEL|IMAGE|ITEM)/", $tagname)) {
if($this->tags) {
$depth = count($this->tags);
if(is_array($tmp = end($this->tags))) {
$parent = key($tmp);
$num = current($tmp);
if($parent) {
$this->tags[$depth-1][$parent][$tagname] = ($this->tags[$depth-1][$parent][$tagname] ?? 0) + 1;
}
}
}
array_push($this->tags, [$tagname => []]);
} else {
if(!preg_match("/^(A|B|I)$/", $tagname)) {
// add tag to tags array
array_push($this->tags, $tagname);
}
}
}
private function endElement($parser, $tagname)
{
if(!preg_match("/^(A|B|I)$/", $tagname)) {
// remove tag from tags array
array_pop($this->tags);
}
}
private function parseData($parser, $data)
{
// return if data contains no text
if(!trim($data)) return;
$evalcode = "\$this->output";
foreach($this->tags as $tag) {
if(is_array($tag)) {
$tagname = key($tag);
$indexes = current($tag);
$evalcode .= "[\"$tagname\"]";
if(isset(${$tagname}) && ${$tagname}) {
$evalcode .= "[" . (${$tagname} - 1) . "]";
}
if($indexes) {
extract($indexes);
}
} else {
if(preg_match("/^([A-Z]+):([A-Z]+)$/", $tag, $matches)) {
$evalcode .= "[\"$matches[1]\"][\"$matches[2]\"]";
} else {
$evalcode .= "[\"$tag\"]";
}
}
}
try {
@eval("$evalcode = $evalcode . '" . addslashes($data) . "';");
} catch(ParseError $e) {
error_log($e->message);
}
}
// display a single channel as HTML
private function display_channel($data, $limit)
{
extract($data);
if(isset($IMAGE) && $IMAGE) {
// display channel image(s)
foreach($IMAGE as $image) {
$this->display_image($image);
}
}
if($TITLE) {
// display channel information
$this->retval .= "<h1>";
if($LINK) {
$this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
}
$this->retval .= stripslashes($TITLE);
if($LINK) {
$this->retval .= "</a>";
}
$this->retval .= "</h1>\n";
if(isset($DESCRIPTION) && $DESCRIPTION) {
$this->retval .= "<p>$DESCRIPTION</p>\n\n";
}
$tmp = [];
if(isset($PUBDATE) && $PUBDATE) {
$tmp[] = "<small>Published: $PUBDATE</small>";
}
if(isset($COPYRIGHT) && $COPYRIGHT) {
$tmp[] = "<small>Copyright: $COPYRIGHT</small>";
}
if($tmp) {
$this->retval .= "<p>" . implode("<br>\n", $tmp) . "</p>\n\n";
}
unset($tmp);
$this->retval .= "<div class=\"divider\"><!-- --></div>\n\n";
}
if(isset($ITEM) && $ITEM) {
// display channel item(s)
foreach($ITEM as $item) {
$this->display_item($item, "CHANNEL");
if(is_int($limit) && --$limit <= 0) break;
}
}
}
// display a single image as HTML
private function display_image($data, $parent = "")
{
extract($data);
if(!$URL) return;
$this->retval .= "<p>";
if($LINK) {
$this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
}
$this->retval .= "<img src=\"$URL\"";
if(isset($WIDTH, $HEIGHT) && $WIDTH && $HEIGHT) {
$this->retval .= " width=\"$WIDTH\" height=\"$HEIGHT\"";
}
$this->retval .= " border=\"0\" alt=\"$TITLE\">";
if($LINK) {
$this->retval .= "</a>";
}
$this->retval .= "</p>\n\n";
}
// display a single item as HTML
private function display_item($data, $parent)
{
extract($data);
if(!$TITLE) return;
$this->retval .= "<p><b>";
if($LINK) {
$this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
}
$this->retval .= stripslashes($TITLE);
if($LINK) {
$this->retval .= "</a>";
}
$this->retval .= "</b>";
if(!isset($PUBDATE) && isset($DC['DATE']) && $DC['DATE']) {
$PUBDATE = $DC['DATE'];
}
if(isset($PUBDATE) && $PUBDATE) {
$this->retval .= " <small>($PUBDATE)</small>";
}
$this->retval .= "</p>\n";
// use feed-formatted HTML if provided
if(isset($CONTENT['ENCODED']) && $CONTENT['ENCODED']) {
$this->retval .= "<p>" . stripslashes($CONTENT['ENCODED']) . "</p>\n";
} elseif(isset($DESCRIPTION) && $DESCRIPTION) {
if(isset($IMAGE) && $IMAGE) {
foreach($IMAGE as $IMG) {
$this->retval .= "<img src=\"$IMG\" alt=\"\">\n";
}
}
$this->retval .= "<p>" . stripslashes($DESCRIPTION) . "</p>\n\n";
}
// RSS 2.0 - ENCLOSURE
if(isset($ENCLOSURE) && $ENCLOSURE) {
$this->retval .= "<p><small><b>Media:</b> <a href=\"{$ENCLOSURE['URL']}\">";
$this->retval .= $ENCLOSURE['TYPE'];
$this->retval .= "</a>";
if(isset($ENCLOSURE['LENGTH'])) {
$this->retval .= " ({$ENCLOSURE['LENGTH']} bytes)";
}
$this->retval .= "</small></p>\n\n";
}
if(isset($COMMENTS) && $COMMENTS) {
$this->retval .= "<p style=\"text-align: right;\"><small>";
$this->retval .= "<a href=\"$COMMENTS\">Comments</a>";
$this->retval .= "</small></p>\n\n";
}
}
private function fixEncoding(&$input, $key, $output_encoding)
{
if(!function_exists('mb_detect_encoding')) return $input;
$encoding = mb_detect_encoding($input);
switch($encoding)
{
case 'ASCII':
case $output_encoding:
break;
case '':
$input = mb_convert_encoding($input, $output_encoding);
break;
default:
$input = mb_convert_encoding($input, $output_encoding, $encoding);
}
}
// display entire feed as HTML
public function getOutput($limit = FALSE, $output_encoding = 'UTF-8')
{
$this->retval = "";
$start_tag = key($this->output);
switch($start_tag)
{
case "RSS":
// new format - channel contains all
foreach($this->output[$start_tag]['CHANNEL'] as $channel) {
$this->display_channel($channel, $limit);
}
break;
case "RDF:RDF":
// old format - channel and items are separate
if(isset($this->output[$start_tag]['IMAGE'])) {
foreach($this->output[$start_tag]['IMAGE'] as $image) {
$this->display_image($image);
}
}
foreach($this->output[$start_tag]['CHANNEL'] as $channel) {
$this->display_channel($channel, $limit);
}
foreach($this->output[$start_tag]['ITEM'] as $item) {
$this->display_item($item, $start_tag);
}
break;
case "HTML":
die("Error: cannot parse HTML document as RSS");
default:
die("Error: unrecognized start tag '$start_tag' in getOutput()");
}
if($this->retval && is_array($this->retval)) {
array_walk_recursive($this->retval, [$this, 'fixEncoding'], $output_encoding);
}
return $this->retval;
}
// return raw data as array
public function getRawOutput($output_encoding = 'UTF-8')
{
array_walk_recursive($this->output, [$this, 'fixEncoding'], $output_encoding);
return $this->output;
}
}
?>
The parsing of the RSS feed into a PHP array is done by the RSSParser class using the startElement, endElement and parseData functions. The remaining functions are used only for displaying the data or accessing the raw data.
Here you can copy the code for rssparser.php:
Fields Supported by Default
This script supports the following attributes (fields) by default but can easily be extended. See the Feed Reader Demonstration for examples of parsed RSS (and Atom) feeds.
Channel (RSS or RDF:RDF)
- Image: URL (required), Width, Height
- Title
- Link
- Description
- Pubdate
- Copyright
Item
- Title (required)
- Link
- Pubdate or DC.Date
- Content->Encoded or Description
- Enclosure: URL, Type, Length (for multimedia attachments)
- Comments
If you think it's worth adding support for other RSS attributes, please let us know using the Feedback link below.
Multibyte String Function support
If your PHP install doesn't include Multibyte String Function support then you will see some errors. You can get around that by jettisoning the fixEncoding function.
In other words, replacing:
return $this->fixEncoding($this->retval, $output_encoding);
with just:
return $this->retval;
The feed will then be displayed using it's original character encoding, which may or may not match the encoding of your HTML page, but other than that shouldn't be a problem.
References
Related Articles - Feed Readers
- PHP RSS Feed Reader Code Example
- PHP Combined RSS and Atom Feed Reader
- PHP Feed Reader with Ajax Updating
- PHP RSS Feed Reader: Source Code
- PHP Displaying and updating RSS Content using Ajax
- PHP YouTube API Feed Reader: Source Code
- PHP RSS and Atom Feed Reader
- PHP Atom Feed Reader: Source Code
User Comments
Most recent 20 of 27 comments:
Post your comment or question
Ranjula 14 January, 2016
I try to use youyr code in my wordpress site in local server then I got an error message 'Call to undefined function http_get_contents()....' . please help me proceed further.
The function you're after is here.
Michael Joens 21 December, 2014
Any updated on the deprecated http_get_contents($file) function? Would love to use this code, but getting an error. Same issue @mike_root. Thank you so much!
You can find a basic version of the http_get_contents function here.
mike root 28 October, 2014
Error message that your function in "RSS Feed Reader: Source Code" article, the code "http_get_contents(" doesn't exist. Has it been deprecated (I'm using Apache 2.4, PHP 5.4)?
On the contrary it's something we've just written - to get the file contents over http using cURL. Only I haven't had time to write it up yet. Stay tuned.
Giuseppe Stumpo 13 October, 2014
Thank you for this awesome class.
I wanna just post a probabily fix for
htmlspecialchars(stripslashes($itemdata['DESCRIPTION']))
and TITLE too
i have replace with
html_entity_decode(utf8_decode (stripslashes($itemdata['TITLE'])))
because i had a problem with any special chars.
Regards.
(sorry for my bad english =D )
Gary Bergen 13 October, 2014
I like your scripting for the PHP: RSS Feed Reader. Do you have a version that uses the PHP command CURL instead of FOPEN?
Thanks!
Gary
woutie 1 June, 2014
hi,
thanks for the code.
Really like it.
especially this:
$output = $rss_parser->getOutput(1);
I just wanted 1 item in my list.
great work.
greetings from the netherlands.
Bill Koerner 7 April, 2013
I had developed some code that basically works, but this looks more complete. One surprise I ran into... If the content included in the <description> tag contains <img...>, I would like to be able to scale the size if the width is greater than my current content pane... Any ideas on how to scale images that are included inside the <description> section?
Have you tried some generic CSS such as:
img { max-width: 100%; }
img { height: auto; }
insel 30 May, 2012
The first of eight tested that works without errors,
absolutely top code, the best
Thanks to a huge mountain
from germany, berlin
Alexej Savčin 9 October, 2011
Hi there. I shuld say this parser was perfect solution to my problem but I have one issue. I can't set number of showing feeds. Can you help me with this?
What you're looking for is just:
$output = $rss_parser->getOutput(3);
This will limit the display to the first 3 items in the feed.
Jonathan Wheat 23 September, 2011
Love the parser.
I had something like this in my feed
and added
$tagname = ereg_replace(":","",$tagname);
inside startElement, then could reference the variable as MCSTARTTIME
I would use str_replace instead of the ereg function which is now deprecated, but yes, that's a good way to extract other variables
shadmego 4 September, 2011
Is there a way to alter the class so it doesn't fail when encountering "undefined entities"?
The feed I am displaying apparently has some characters that the script doesn't understand and it is causing my page to fail with the error: "myRSSParser: Error undefined entity at line 385"
Line 385 would be the line in the cache file being read by the script.
The RSS feed reader class relies on the XML Parser extension included with PHP. That is where the error is being thrown rather then from our code (ref: php.net/xml_parse).
To avoid XML errors you need to make sure that the input is valid, or maybe just tweak the character-encoding or use utf8_encode if that's the problem..
For your particular case you can insert the following patch:
while($data = fread($fp, 4096)) {
if(!in_array(mb_detect_encoding($data), array("UTF-8", "ASCII"))) {
$data = preg_replace('[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S', '?', $data);
}
xml_parse(...) or die(
Alex 21 June, 2011
Tks for your greate post, beside I have one question that is :
How can I use this code to put rss from two source on one page ?
You only need to include the PHP class one time. The code that follows can then be repeated as many times as you want on the page, though it's probably a good idea use caching.
ryan 13 April, 2011
this solution is fantastic. one question though: if i was to include a truncate function to truncate each entry to a certain number of words, where would i put that in the code? i was reading your other page on truncating and couldn't figure out how to merge the two.
thanks a bunch.
You can truncate the DESCRIPTION field in the display_item function just before it's added to the return string:
e.g.
$DESCRIPTION = myTruncate($DESCRIPTION, 200);
$this->retval .= "<p>" . stripslashes($DESCRIPTION) . "</p>\n\n";
anthony 10 January, 2011
I got an error.
Error: unrecognized start tag 'FEED' in getOutput()
can you please explain
See comments above.
Jeff Quiros 21 September, 2009
Your class.myrssparser.php has been extremely helpful to me in understanding creating/displaying RSS feeds, but in the code as copied onto my server. I get a huge string of error messages. The first few are as follows:
Notice: Undefined index: CHANNEL in C:InetpubVTRADERRfactorclass.myrssparser.php on line 42
Notice: Undefined variable: RSS in ...
Notice: Undefined index: RSS in ...
Notice: Undefined index: LINK in ...
The errors you're seeing are really "Notices" saying that a variable (array index) is being referenced without previously being created/initialised. You can suppress these messages by setting your error_reporting level in PHP to "E_ALL ^ E_NOTICE" so it displays only actual errors and warnings and not notices.
Esteban 21 July, 2009
First of all, your RSS Feed Reader class is great. Thanks for sharing it.
I've been using without major problems; although, I find one little issue I could not resolve yet: I would like to change the date format that comes within the item->pubdate tag to something more friendly. Could you guys give any ideas?
A few people have asked about this. I suggest something like the following:
if($PUBDATE) {
$PUBDATE = date('l, jS F Y', strtotime($PUBDATE));
$tmp[] = "<small>Published: $PUBDATE</small>";
}
Ben 11 June, 2009
Using blogger's atom.xml, the > and < and some / used in <br /> are not being parsed out, and are appearing in the html. Any ideas?
If you send me the feed URL I can check it out
Keith Chadwick 28 March, 2009
I have no display whatso ever!!!!
the only error I receive is
myRSSParser: Could not open www.example.net/rss.xml for input.
I just cannot work out what the problem is - It is not just this example but at least two other reader example also. Any ideas?
Hi Keith, it sounds like your webserver is denying access to the request from PHP. That can happen for example if you have a firewall or filtering rules (mod_rewrite) that deny access when there is no HTTP_USER_AGENT. Check your server logs for a 403 error.
joe w 19 March, 2009
this is an excellent tutorial. i searched high and low for an rss tutorial and this one is miles ahead of the others. thank you very much for it. i would like to ask, how do you limit the results per page?
Hi Joe, you just need to pass the number of items you want to display as the first argument to the getOutput() function.
eviriyanti 15 September, 2008
Thanks for this article, its really help me.
(^_^)