用simple_html_dom会把问题变得非常简单
完整代码如下:
<?php
require_once('simple_html_dom.php');
class grabber
{
private $url,$html;
public function __construct($url = null)
{
$this->url = $url;
}
public function setUrl($url)
{
$this->url = $url;
}
public function getDOM()
{
$this->html = file_get_html($this->url);
}
function retrieveTitle()
{
return array_shift($this->html->find('title'))->innertext;
}
function retrieveDescription()
{
return array_shift($this->html->find('meta[name="description"]'))->content;
}
function retrieveImgs()
{
$images = array();
foreach($this->html->find('img[src]') as $img)
{
$images[] = $img->src;
}
return $images;
}
}
$grabber = new grabber('http://movies.yahoo.com/news/movies.ap.org/merry-xmas-hollywood-boxoffice-record-falls-ap');
$grabber->getDOM();
var_dump($grabber->retrieveTitle());
var_dump($grabber->retrieveDescription());
var_dump($grabber->retrieveImgs());
一些解释:
file_get_html()获取某个url对应的html并将其转化为DOM
array_shift()获取数组的第一个元素
img[src]表示只选取带有src属性的img标签
只要熟悉jQuery的选取规则便能瞬间写出来。