我想抓取标准登录表单后面隐藏的网站内容(通过HTTPS在我的网站和目标网站上,如果有的话)。PHP cURL - 来自同一个'用户'的多个请求
我可以成功登录到该页面通过做POST
请求,就像这样:
include("inc/simple_html_dom.php");
$url = "https://account.tfl.gov.uk/Login";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$cookie = 'cookies.txt';
$timeout = 60;
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch,CURLOPT_POSTFIELDS,"UserName=USER&Password=PASSWORD&AppId=00000000-0000-0000-0000-000000000000&ReturnUrl=");
$result = curl_exec($ch);
然后我希望能够刮擦使用者的旅程历史,这可在https://oyster.tfl.gov.uk/oyster/journeyHistoryThrottle.do?_qs=_qv=[SESSION CODE]在一次登录。要获得会话代码我使用SimpleHTMLDom:
$html = str_get_html($result);
$codeRaw = $html->getElementById('Oyster')->childNodes(1);
$code1 = explode("?_o=",$codeRaw);
$code2 = explode('"',$code1[1]);
$codeReal = $code2[0];
我再试图做的是另卷曲请求访问该网页:
$url = "https://oyster.tfl.gov.uk/oyster/journeyHistoryThrottle.do?_qs=_qv=".$codeReal;
echo $url;
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$cookie = 'cookies.txt';
$timeout = 60;
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
$result = str_replace('"/','"https://oyster.tfl.gov.uk/',curl_exec($ch));
curl_close($ch);
echo $result;
但我得到的是一个登录页面 - 我怀疑是因为两个cURL请求在TfL站点上生成不同的“会话”?
有没有办法强制cURL使用以前的会话?如果相关,那么在浏览历史记录分页时,我可能还需要做进一步的请求。
或者其他任何方式来实现这一目标? (TfL没有为此提供API)
您不必一旦你这样做第二卷曲请求重新设置cookies,正好被清除的职位,更换URL – Faxsy
如何取消设置的职位?哪些cookie相关行需要从第二次卷曲中删除? –
您收到了@ miken32 – Faxsy