Simple Screen Scraping – Our Staff Page Example

You may have heard the term screen scraping. Sounds fancy but it just means reading the contents of a remote page and formatting it. It is a terrific way to make the content of a non-mobile page mobile and NOT having to update the content on both pages.

Our all staff contacts page is a great example. The college maintains the directory of all staff, that directory can be searched by department. On our main page we just link to that staff search page for the list of all library staff. But that page is not mobile so our script just gets the section of the page with the contact data and weeds out the fields we don’t want and shows the list. We added the CSS classes they already had in the list to our CSS to format the list.

It takes the phone numbers and turns them into telephone links using a regular expression replacement.

PHP script reads remote web page an formats it for mobile web


<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');
//
// Sets and gets the contents of the URL with our contact information
$url = 'http://my.organization.edu/directory/display.cfm?department_id=123456';
//
$str = file_get_contents($url);
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
$wholepage = get_url_contents($url);
//
echo "\n\n";
//
// Grabs the section of the page with the table of contacts
// or outputs an error
//
$reg = '/criteria<\/p>(.*)<\/table>/s';
preg_match($reg,$wholepage,$matches);
if (isset($matches[1])) {
echo "";
} else {
echo '<p class=indent>Error: could not retrieve staff list.</p>';
}
$directions = $matches[1];
//
// finds plain text phone numbers in the table code and makes telephone links
// At the same time it is removing a section with department and room number
$find = '/<br \/>Department:.*\(845\) ([0-9]+)-([0-9]+) /Uis';
$replacewith = '</p><a href=tel:845$1$2>$1-$2</a>';
$done = preg_replace($find, $replacewith, $directions);
//
// simplify email and puts it on the same line with phone
$findlinks = '<br />E-mail:';
$replacewithnot = ' ';
$looselinks = str_replace( $findlinks, $replacewithnot, $done);
//
echo $looselinks;
//
echo "</table>";
?>

Advertisements

About mobilelibraryinacan

Library tech person 4 20 years
This entry was posted in Mobile HTML and CSS, PHP Scripting, The Fun Stuff and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s