我用Xpath做了很多HTML抓取。但现在我不得不刮掉一些JSON,不知道该怎么做。我想刮的来源是:用PHP抓取JSON
{
"ASIN" : "B00DR4LYHY",
"FeatureName" : "price_feature_div",
"Type" : "JSON",
"Value" :
{
"content" :
{"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\"
class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"}
}
}
我得到这个代码:
$URL = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=OUTERWEAR&json=1&dpxAjaxFlag=1&sCac=1&isUDPFlag=1&twisterView=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeID=1036592&rid=0Q05FXGQJSA20X44DJVG&parentAsin=B00DR4LUQY&enPre=1&qid=1413775191&dStr=size_name%2Ccolor_name&auiAjax=1&storeID=apparel&psc=1&asinList=B00DR4LYHY&isFlushing=2&id=B00DR4LYHY&prefetchParam=0&mType=full&dpEnvironment=softlines';
我需要得到的是价格(37.60 $)
我正在使用的代码,从Venkata提供的是:
$URL = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=OUTERWEAR&json=1&dpxAjaxFlag=1&sCac=1&isUDPFlag=1&twisterView=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeID=1036592&rid=0Q05FXGQJSA20X44DJVG&parentAsin=B00DR4LUQY&enPre=1&qid=1413775191&dStr=size_name%2Ccolor_name&auiAjax=1&storeID=apparel&psc=1&asinList=B00DR4LYHY&isFlushing=2&id=B00DR4LYHY&prefetchParam=0&mType=full&dpEnvironment=softlines';
$page = file_get_contents($URL);
$decoded = json_decode($page);
$html = $decoded->Value->content->price_feature_div;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//frem dom method
$elements = $dom->getElementById("priceblock_ourprice")->item(0);
//OR use extract it from xpath like below line
$priceNode = $xpath->query("//*[@id='priceblock_ourprice']");
if (!is_null($elements)) {
//$priceNode = $elements->item(0);
$ourPrice = $priceNode;
echo $ourPrice;
}
我认为最好的是使用REGEX,但该表达式应该是什么样子?
解码json,提取html,然后像平常一样将它输入到dom中。不,“最好”会**不是正则表达式。 – 2014-10-20 17:11:04
@MarcB谢谢,但是,你能解释怎么做? – Emilios1995 2014-10-20 17:21:57
http://php.net/json_decode – 2014-10-20 17:31:53