PHP: cURL alternative to file_get_contents over HTTP
In newer versions of PHP you will often find that fetching remote files using fopen or file_get_contents has been disabled in the name of security.
Here we present a function http_get_contents using the Client URL Library (a.k.a cURL) which can serve as a workaround.
The http_get_contents function
This is currently a work in progress with some enhancements in the pipeline. It is, however, already referenced from our RSS and Atom Feed Reader code so needs to be presented.
function http_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if(FALSE === ($retval = curl_exec($ch))) {
error_log(curl_error($ch));
} else {
return $retval;
}
}
If the cURL libraries have not been activated in PHP you will get a function not found error trying to call curl_init.
Future versions will include better error handling and parsing of HTTP response headers, to detect broken links and follow redirects for example.
Sample Usage
The following code block checks whether the file address starts with
http and conditionally calls either http_get_contents
or file_get_contents:
$file = "https://www.the-art-of-web.com/rss.xml";
$contents = preg_match("/^http/", $file) ? http_get_contents($file) : file_get_contents($file);
The cURL approach can also be used for FTP and other protocols.
Improved Functionality
After putting this function through it's paces we came up with a couple of improvements.
Firstly, you can now supply an array of extra options to be included in the request by passing an associative array ($opts), and secondly, a default value is included for HTTP_USER_AGENT using the name of the calling domain as the User Agent string.
<?PHP
// Original JavaScript code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
function http_get_contents($url, Array $opts = [])
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
if(is_array($opts) && $opts) {
foreach($opts as $key => $val) {
curl_setopt($ch, $key, $val);
}
}
if(!isset($opts[CURLOPT_USERAGENT])) {
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['SERVER_NAME']);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if(FALSE === ($retval = curl_exec($ch))) {
error_log(curl_error($ch));
}
return $retval;
}
Passing a non-blank User Agent string is good practice, and sometimes required to avoid your requests being blocked. See our article on parsing robots.txt for some examples.
Final version
The final addition to our script is an optional paramter where we can pass an array to be populated by cURL with information about the operation. This is useful if you're interested in the HTTP status of the request for example.
<?PHP
namespaece Chirp;
// Original JavaScript code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
function http_get_contents($url, array $opts = [], array &$getinfo = NULL)
{
$ch = curl_init();
if($getinfo !== NULL) {
curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE);
}
curl_setopt($ch, CURLOPT_URL, $url);
if(is_array($opts) && $opts) {
foreach($opts as $key => $val) {
curl_setopt($ch, $key, $val);
}
}
if(!isset($opts[CURLOPT_USERAGENT])) {
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['SERVER_NAME']);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if(FALSE === ($retval = curl_exec($ch))) {
error_log(curl_error($ch) . " {$url}");
}
if($getinfo !== NULL) {
$getinfo = curl_getinfo($ch);
}
return $retval;
}
Here is an example using the new functionality:
$curlinfo = [];
$page_contents = http_get_contents($url, $curl_opts, $curlinfo);
echo "<p>({$curlinfo['http_code']}) {$curlinfo['url']}</p>\n";
This will output the HTTP Status Code of the URL fetched.
Some Examples
In this example we're getting around Facebook's buggy IPv6 interface by forcing the connection to take place over IPv4.
<?PHP
$endpoint = "https://graph.facebook.com/?id=" . urlencode($uri);
$curlopts = [
CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4
];
$retval = http_get_contents($endpoint, $curlopts);
?>
Fetching and parsing the Apache response headers for an HTTP request:
<?PHP
function read_header($ch, $string)
{
// function to receive and process the response headers
}
$tmp = http_get_contents($url, [
CURLOPT_HEADERFUNCTION => __NAMESPACE__ . '\read_header',
CURLOPT_NOBODY => TRUE
]);
?>
Note that in all cases the CURLOPT_* constants are not to be quoted. They are not strings, but actually 'long' integers defined by the Client URL.