So now I am using php cURL for getting headers and other information for Tech Scraper.
Hopefully it works a little bit better.
I am however experiencing an interesting HTTP 3.01 error through cURL on some pages, so I wrote in some
counters to that and so it still uses file_get_contents($url) if cURL returns an HTTP 3.01 error.
I also split all output into arrays so I could keep track of output and use a statement if I could not return any information.
http://www.x24d.com/tech_scrape/
changes:
$ch = curl_init();
$options = array(
CURLOPT_URL => $url,
CURLOPT_HEADER => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.8.1.11)
Gecko/20071127 Firefox/2.0.0.11",
CURLOPT_FOLLOWLOCATION => true
);
curl_setopt_array($ch, $options);
$result = curl_exec($ch);
$AR_info = curl_getinfo($ch);
curl_close($ch);