2013-08-19 94 views
7

这个问题听起来很简单,但并不像听起来那么简单。从Pinterest网址获取所有图片

的什么是错的

举一个例子小结,使用该板; http://pinterest.com/dodo/web-designui-and-mobile/

检查对板本身的HTML(该div内部与类GridItems)在打印量的顶部:

<div class="variableHeightLayout padItems GridItems Module centeredWithinWrapper" style=".."> 
    <!-- First div with a displayed board image --> 
    <div class="item" style="top: 0px; left: 0px; visibility: visible;">..</div> 
    ... 
    <!-- Last div with a displayed board image --> 
    <div class="item" style="top: 3343px; left: 1000px; visibility: visible;">..</div> 
</div> 

然而在页面的底部,之后激活所述无限滚动一好几次,我们得到这样的HTML:

<div class="variableHeightLayout padItems GridItems Module centeredWithinWrapper" style=".."> 
    <!-- First div with a displayed board image --> 
    <div class="item" style="top: 12431px; left: 750px; visibility: visible;">..</div> 
    ... 
    <!-- Last div with a displayed board image --> 
    <div class="item" style="top: 19944px; left: 750px; visibility: visible;">..</div> 
</div> 

正如你可以看到,一些容器图像的上涨在页面上已经消失,而不是所有的容器的图像在F加载首先加载页面。


我想做

我希望能够创建一个C#脚本(或此刻的任何服务器端语言),可下载网页的全HTML什么(即,检索每图片),然后图片将从他们的网址下载。下载网页并使用适当的XPath很容易,但真正的挑战是为每个图像下载完整的HTML。

有没有一种方法可以模拟滚动到页面底部,还是有更简单的方法可以检索每个图像?我想Pinterest使用AJAX来改变HTML,有没有一种方法可以以编程方式触发事件来接收所有的HTML?如果您没有任何建议和解决方案,甚至可以阅读这个非常长的问题,请提前感谢您!

伪代码

using System; 
using System.Net; 
using HtmlAgilityPack; 

private void Main() { 
    string pinterestURL = "http://www.pinterest.com/..."; 
    string XPath = ".../img"; 

    HtmlDocument doc = new HtmlDocument(); 

    // Currently only downloads the first 25 images. 
    doc.Load(strPinterestUrl); 

    foreach(HtmlNode link in doc.DocumentElement.SelectNodes(strXPath)) 
    { 
     image_links[] = link["src"]; 
     // Use image links 
    } 
} 
+0

它只加载25个,因为当您滚动到底部时,通过ajax加载其余的部分,即“无限滚动”。我想你必须模仿滚动。或者如果他们将手指拉出来,他们已经发布了他们的API。 – mattytommo

+0

当AJAX事件被调用时,我无法管理到底发生了什么?这是一个真正的耻辱关于API –

+0

嗯,我不这么认为。你可能会更好地尝试使用JavaScript/Jquery,这样你可以获得所有的链接,然后模仿滚动到最后,然后重复,直到滚动完成后,你可以发送一个字符串数组到服务器。 – mattytommo

回答

2

好的,所以我认为这可能是(需要一些改动)你需要什么。

注意事项:

  1. 这是PHP,不C#(可是你说你感兴趣的任何服务器端语言)。
  2. 此代码会嵌入(非官方)Pinterest搜索终结点。您需要更改$ data和$ search_res以反映您的任务的适当端点(例如BoardFeedResouce)。注意:至少在搜索时,Pinterest目前使用两个端点,一个用于初始页面加载,另一个用于无限滚动操作。每个都有自己的预期参数结构。
  3. Pinterest没有官方的公开API,只要他们改变任何内容,并且没有任何警告,就会期望它被破坏。
  4. 你可能会发现pinterestapi.co.uk更容易实施和接受你正在做的事情。
  5. 我有一些演示/调试代码下的类不应该在那里,一旦你得到你想要的数据,并且你可能想改变一个默认的页面提取限制。

问题的兴趣:

  1. 下划线_参数需要在JavaScript格式,即时间戳。比如Unix时间,但是它增加了毫秒。它实际上并未用于分页。
  2. 分页使用bookmarks属性,因此您向不需要它的“新”端点发出第一个请求,然后从结果中取出bookmarks,并在请求中使用它以获取下一个“页面”结果,然后从这些结果中取出bookmarks以获取下一页,等等,直到用完结果或达到预设的限制(或者在脚本执行时间点击服务器最大值)。我很想知道bookmarks字段的编码。我想认为除了一个PIN码或其他页面标记之外,还有一些有趣的秘密酱油。
  3. 我跳过html,而是处理JSON,因为比使用DOM操作解决方案或一堆正则表达式更容易(对我来说)。
<?php 

if(!class_exists('Skrivener_Pins')) { 

    class Skrivener_Pins { 

    /** 
    * Constructor 
    */ 
    public function __construct() { 
    } 

    /** 
    * Pinterest search function. Uses Pinterest's "internal" page APIs, so likely to break if they change. 
    * @author [@skrivener] Philip Tillsley 
    * @param $search_str  The string used to search for matching pins. 
    * @param $limit   Max number of pages to get, defaults to 2 to avoid excessively large queries. Use care when passing in a value. 
    * @param $bookmarks_str Used internally for recursive fetches. 
    * @param $pages   Used internally to limit recursion. 
    * @return array()  int['id'], obj['image'], str['pin_link'], str['orig_link'], bool['video_flag'] 
    * 
    * TODO: 
     * 
     * 
    */ 
    public function get_tagged_pins($search_str, $limit = 1, $bookmarks_str = null, $page = 1) { 

     // limit depth of recursion, ie. number of pages of 25 returned, otherwise we can hang on huge queries 
     if($page > $limit) return false; 

     // are we getting a next page of pins or not 
     $next_page = false; 
     if(isset($bookmarks_str)) $next_page = true; 

     // build url components 
     if(!$next_page) { 

     // 1st time 
     $search_res = 'BaseSearchResource'; // end point 
     $path = '&module_path=' . urlencode('SearchInfoBar(query=' . $search_str . ', scope=boards)'); 
     $data = preg_replace("'[\n\r\s\t]'","",'{ 
      "options":{ 
      "scope":"pins", 
      "show_scope_selector":true, 
      "query":"' . $search_str . '" 
      }, 
      "context":{ 
      "app_version":"2f83a7e" 
      }, 
      "module":{ 
      "name":"SearchPage", 
      "options":{ 
       "scope":"pins", 
       "query":"' . $search_str . '" 
      } 
      }, 
      "append":false, 
      "error_strategy":0 
      }'); 
     } else { 

     // this is a fetch for 'scrolling', what changes is the bookmarks reference, 
     // so pass the previous bookmarks value to this function and it is included 
     // in query 
     $search_res = 'SearchResource'; // different end point from 1st time search 
     $path = ''; 
     $data = preg_replace("'[\n\r\s\t]'","",'{ 
      "options":{ 
      "query":"' . $search_str . '", 
      "bookmarks":["' . $bookmarks_str . '"], 
      "show_scope_selector":null, 
      "scope":"pins" 
      }, 
      "context":{ 
      "app_version":"2f83a7e" 
      }, 
      "module":{ 
       "name":"GridItems", 
      "options":{ 
       "scrollable":true, 
       "show_grid_footer":true, 
       "centered":true, 
       "reflow_all":true, 
       "virtualize":true, 
       "item_options":{ 
       "show_pinner":true, 
       "show_pinned_from":false, 
       "show_board":true 
       }, 
       "layout":"variable_height" 
      } 
      }, 
      "append":true, 
      "error_strategy":2 
     }'); 
     } 
     $data = urlencode($data); 
     $timestamp = time() * 1000; // unix time but in JS format (ie. has ms vs normal server time in secs), * 1000 to add ms (ie. 0ms) 

     // build url 
     $url = 'http://pinterest.com/resource/' . $search_res . '/get/?source_url=/search/pins/?q=' . $search_str 
      . '&data=' . $data 
      . $path 
      . '&_=' . $timestamp;//'1378150472669'; 

     // setup curl 
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
     curl_setopt($ch, CURLOPT_HTTPHEADER, array("X-Requested-With: XMLHttpRequest")); 

     // get result 
     $curl_result = curl_exec ($ch); // this echoes the output 
     $curl_result = json_decode($curl_result); 
     curl_close ($ch); 

     // clear html to make var_dumps easier to see when debugging 
     // $curl_result->module->html = ''; 

     // isolate the pin data, different end points have different data structures 
     if(!$next_page) $pin_array = $curl_result->module->tree->children[1]->children[0]->children[0]->children; 
     else $pin_array = $curl_result->module->tree->children; 

     // map the pin data into desired format 
     $pin_data_array = array(); 
     $bookmarks = null; 
     if(is_array($pin_array)) { 
     if(count($pin_array)) { 

      foreach ($pin_array as $pin) { 

      //setup data 
      $image_id = $pin->options->pin_id; 
      $image_data = (isset($pin->data->images->originals)) ? $pin->data->images->originals : $pin->data->images->orig; 
      $pin_url = 'http://pinterest.com/pin/' . $image_id . '/'; 
      $original_url = $pin->data->link; 
      $video = $pin->data->is_video; 

      array_push($pin_data_array, array(
       "id"   => $image_id, 
       "image"  => $image_data, 
       "pin_link" => $pin_url, 
       "orig_link" => $original_url, 
       "video_flag" => $video, 
      )); 
      } 
      $bookmarks = reset($curl_result->module->tree->resource->options->bookmarks); 

     } else { 
      $pin_data_array = false; 
     } 
     } 

     // recurse until we're done 
     if(!($pin_data_array === false) && !is_null($bookmarks)) { 

     // more pins to get 
     $more_pins = $this->get_tagged_pins($search_str, $limit, $bookmarks, ++$page); 
     if(!($more_pins === false)) $pin_data_array = array_merge($pin_data_array, $more_pins); 
     return $pin_data_array; 
     } 

     // end of recursion 
     return false; 
    } 

    } // end class Skrivener_Pins 
} // end if 



/** 
* Debug/Demo Code 
* delete or comment this section for production 
*/ 

// output headers to control how the content displays 
// header("Content-Type: application/json"); 
header("Content-Type: text/plain"); 
// header("Content-Type: text/html"); 

// define search term 
// $tag = "vader"; 
$tag = "haemolytic"; 
// $tag = "qjkjgjerbjjkrekhjk"; 

if(class_exists('Skrivener_Pins')) { 

    // instantiate the class 
    $pin_handler = new Skrivener_Pins(); 

    // get pins, pinterest returns 25 per batch, function pages through this recursively, pass in limit to 
    // override default limit on number of pages to retrieve, avoid high limits (eg. limit of 20 * 25 pins/page = 500 pins to pull 
    // and 20 separate calls to Pinterest) 
    $pins1 = $pin_handler->get_tagged_pins($tag, 2); 

    // display the pins for demo purposes 
    echo '<h1>Images on Pinterest mentioning "' . $tag . '"</h1>' . "\n"; 
    if($pins1 != false) { 
    echo '<p><em>' . count($pins1) . ' images found.</em></p>' . "\n"; 
    skrivener_dump_images($pins1, 5); 
    } else { 
    echo '<p><em>No images found.</em></p>' . "\n"; 
    } 
} 

// demo function, dumps images in array to html img tags, can pass limit to only display part of array 
function skrivener_dump_images($pin_array, $limit = false) { 
    if(is_array($pin_array)) { 
    if($limit) $pin_array = array_slice($pin_array, -($limit)); 
    foreach ($pin_array as $pin) { 
     echo '<img src="' . $pin['image']->url . '" width="' . $pin['image']->width . '" height="' . $pin['image']->height . '" >' . "\n"; 
    } 
    } 
} 

?> 

让我知道,如果你碰上得到这个适应您的具体端点的问题。对于代码中的任何不合适的Apols,它最初都没有生成。

+1

那么这个问题需要一些时间来回答,但你基本上已经解决了 - 幸运的是,我几乎已经破解了这个难以捉摸的“书签”。例如,采取一些书签串并puttingthem通过一个base64解码器提供了: - > 18788523419059400:25 | 77a8c15de91998d843301116b0345928753478fa9ac0b7da855a8eeccb9c1f84 - > 18788523419039267:49 | 3686b33864aa96a215b28dd5e442afc06e6c76615a8adaae9f6f526432d47d12 下面的格式: - > {pinID} {itemNumber} | {随机base16字符串64个字符} 帮助我破解这最后一部分,我想我们会做到这一点! –

+0

不错!直到10月底我在两个项目之间休息一下,我才有机会进一步挖掘。我的直觉本能会在时间/日期标记上提出一些变化,或者可能是用于错误检查的某些部分数据的散列,但这些都是黑暗中的刺。当我得到一点时,将会重新访问:) – Skrivener

+0

我很感谢帮助,只要你有这个能力,就没有很多64字符的十六进制散列系统。我已经尝试在SHA256中进行编码 - > {pinID}:{item#},{pinID}:{item#}和{pinID},但这并不富有成效。您提供的PHP无论如何都能正常工作,但如果这完全是程序化的,它会很好!再次感谢您的持续帮助:) –

1

一对夫妇的人已经开始使用JavaScript来模拟滚动建议。

我不认为你需要模仿滚动,我想你只需要找到每当发生滚动时通过AJAX调用的URI的格式,然后你可以顺序获得结果的每个“页面”。需要一点落后的工程。

使用Chrome检查器的网络选项卡中,我可以看到,一旦我达到一定的距离下跌的一页,这个URI被称为:

http://pinterest.com/resource/BoardFeedResource/get/?source_url=%2Fdodo%2Fweb-designui-and-mobile%2F&data=%7B%22options%22%3A%7B%22board_id%22%3A%22158400180582875562%22%2C%22access%22%3A%5B%5D%2C%22bookmarks%22%3A%5B%22LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw%3D%3D%22%5D%7D%2C%22context%22%3A%7B%22app_version%22%3A%22fb43cdb%22%7D%2C%22module%22%3A%7B%22name%22%3A%22GridItems%22%2C%22options%22%3A%7B%22scrollable%22%3Atrue%2C%22show_grid_footer%22%3Atrue%2C%22centered%22%3Atrue%2C%22reflow_all%22%3Atrue%2C%22virtualize%22%3Atrue%2C%22item_options%22%3A%7B%22show_rich_title%22%3Afalse%2C%22squish_giraffe_pins%22%3Afalse%2C%22show_board%22%3Afalse%2C%22show_via%22%3Afalse%2C%22show_pinner%22%3Afalse%2C%22show_pinned_from%22%3Atrue%7D%2C%22layout%22%3A%22variable_height%22%7D%7D%2C%22append%22%3Atrue%2C%22error_strategy%22%3A1%7D&_=1377092055381

如果我们解码的是,我们看到,它主要是JSON

http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data= 
{ 
"options": { 
    "board_id": "158400180582875562", 
    "access": [], 
    "bookmarks": [ 
     "LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw==" 
    ] 
}, 
"context": { 
    "app_version": "fb43cdb" 
}, 
"module": { 
    "name": "GridItems", 
    "options": { 
     "scrollable": true, 
     "show_grid_footer": true, 
     "centered": true, 
     "reflow_all": true, 
     "virtualize": true, 
     "item_options": { 
      "show_rich_title": false, 
      "squish_giraffe_pins": false, 
      "show_board": false, 
      "show_via": false, 
      "show_pinner": false, 
      "show_pinned_from": true 
     }, 
     "layout": "variable_height" 
    } 
}, 
"append": true, 
"error_strategy": 1 
} 
&_=1377091719636 

向下滚动,直到我们得到了第二次请求,而我们看到的这款

http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data= 
{ 
    "options": { 
     "board_id": "158400180582875562", 
     "access": [], 
     "bookmarks": [ 
      "LT4xNTg0MDAxMTE4NjcwNTk1ODQ6NDl8ODFlMDUwYzVlYWQxNzVmYzdkMzI0YTJiOWJkYzUwOWFhZGFkM2M1MzhiNzA0ZDliZDIzYzE3NjkzNTg1ZTEyOQ==" 
     ] 
    }, 
    "context": { 
     "app_version": "fb43cdb" 
    }, 
    "module": { 
     "name": "GridItems", 
     "options": { 
      "scrollable": true, 
      "show_grid_footer": true, 
      "centered": true, 
      "reflow_all": true, 
      "virtualize": true, 
      "item_options": { 
       "show_rich_title": false, 
       "squish_giraffe_pins": false, 
       "show_board": false, 
       "show_via": false, 
       "show_pinner": false, 
       "show_pinned_from": true 
      }, 
      "layout": "variable_height" 
     } 
    }, 
    "append": true, 
    "error_strategy": 2 
} 
&_=1377092231234 

正如你所看到的,没有太大的改变。 Board_id是一样的。 error_strategy现在是2,最后& _是不同的。

& _参数在这里是关键。我敢打赌,它告诉页面从哪里开始下一组照片。我无法在任何回复或原始页面HTML中找到对它的引用,但它必须位于某处,或者由客户端的JavaScript生成。无论哪种方式,页面/浏览器都必须知道接下来要问什么,所以这些信息是你应该能够得到的。

+0

非常感谢你这样的回答 - 它的内容丰富,但是信息量不够。我真的很努力,很不幸,我很难过,因为我也遇到过这个JSON脚本,并想知道发生了什么。还要注意“书签”价值的变化 - 另一个谜团。我已经为50rep提供了一个答案,可以告诉我JSON的确切部分会导致这些更新以及如何触发它。我也相信,单独的JSON信息应该允许我请求这些URL,返回HTML和标识图像,然后对JSON字符的URL进行编码,直到板结束。 –

0

您可以通过这个头的请求触发JSON端点:X-Requested-With:XMLHttpRequest

在控制台试试这个命令:

curl -H "X-Requested-With:XMLHttpRequest" "http://pinterest.com/resource/CategoryFeedResource/get/?source_url=%2Fall%2Fgeek%2F&data=%7B%22options%22%3A%7B%22feed%22%3A%22geek%22%2C%22scope%22%3Anull%2C%22bookmarks%22%3A%5B%22Pz8xMzc3NjU4MjEyLjc0Xy0xfDE1ZjczYzc4YzNlNDg3M2YyNDQ4NGU1ZTczMmM0ZTQyYzBjMWFiMWNhYjRhMDRhYjg2MTYwMGVkNWQ0ZDg1MTY%3D%22%5D%2C%22is_category_feed%22%3Atrue%7D%2C%22context%22%3A%7B%22app_version%22%3A%22addc92b%22%7D%2C%22module%22%3A%7B%22name%22%3A%22GridItems%22%2C%22options%22%3A%7B%22scrollable%22%3Atrue%2C%22show_grid_footer%22%3Atrue%2C%22centered%22%3Atrue%2C%22reflow_all%22%3Atrue%2C%22virtualize%22%3Atrue%2C%22item_options%22%3A%7B%22show_pinner%22%3Atrue%2C%22show_pinned_from%22%3Afalse%2C%22show_board%22%3Atrue%2C%22show_via%22%3Afalse%7D%2C%22layout%22%3A%22variable_height%22%7D%7D%2C%22append%22%3Atrue%2C%22error_strategy%22%3A2%7D&module_path=App()%3EHeader()%3EDropdownButton()%3EDropdown()%3ECategoriesMenu(resource%3D%5Bobject+Object%5D%2C+name%3DCategoriesMenu%2C+resource%3DCategoriesResource(browsable%3Dtrue))&_=1377658213300" | python -mjson.tool 

你会看到在输出的JSON的引脚数据。你应该能够解析它并抓取你需要的下一个图像。

至于这个位:&_=1377658213300。我推测这是上一个列表的最后一个引脚的ID。您应该能够在每次通话时使用上一个响应中的最后一个引脚替换它。

0
#!/usr/bin/env bash 
## 
## File: getpins.bsh 
## 
## Copyrighted by +A.M.Danischewski 2016+ (c) 
## This program may be reutilized without limits, provided this 
## notice remain intact. 

## If this breaks one day, then just fire up firefox Developer Tools and check the network traffic to 
## capture "copy as curl" of the calls to the search page (filter with BaseSearchResource), then the 
## call to feed more data (filter with SearchResource). 
## 
## Do a search on whatever you want remove the cookie header, and add -o ret2.html -D h2.txt -c c1.txt, 
## then search replace the search terms as SEARCHTOKEN1 and SEARCHTOKEN2. 
## 
## Description this script facilitates alternate browsers, by caching images/pins 
## from pinterest. This script is hardwired for two search terms. First create a directory 
## to where you want the images to go, then cd there. 
## Usage: 
## $> cd /big/drive/auto_gyros 
## $> getpins.bsh "sleek autogyros" 
## 
## Expect around 900 images to land wherever you select, so make sure you have space! =) 
## 

declare -r ORIG_IMGS="pin_orig_imgs.txt" 
declare -r TMP_IMGS="pin_imgs.txt" 
declare -r UA_HEADER="User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.$(($RANDOM%10))) Gecko/20100101 Firefox/19.0" 

## Say Hello to the main page and get a cookie. 
declare PINCMD1=$(cat << EOF 
curl -o ret1.html -D h1.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Connection: keep-alive' 'https://www.pinterest.com/' 
EOF 
) 
## Start a search for our dear search terms. 
declare PINCMD2=$(cat << EOF 
curl -H 'X-APP-VERSION: ea7a93a' -o ret2.html -D h2.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'X-Pinterest-AppState: active' -H 'X-NEW-APP: 1' -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.pinterest.com' -H 'Connection: keep-alive' 'https://www.pinterest.com/resource/BaseSearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Fq%3DSEARCHTOKEN1%2520SEARCHTOKEN2%26rs%3Dtyped%260%3DSEARCHTOKEN1%257Ctyped%261%3DSEARCHTOKEN2%257Ctyped&data=%7B%22options%22%3A%7B%22restrict%22%3Anull%2C%22scope%22%3A%22pins%22%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%7D%2C%22context%22%3A%7B%7D%2C%22module%22%3A%7B%22name%22%3A%22SearchPage%22%2C%22options%22%3A%7B%22restrict%22%3Anull%2C%22scope%22%3A%22pins%22%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%7D%7D%2C%22render_type%22%3A1%2C%22error_strategy%22%3A0%7D&module_path=App%3EHeader%3ESearchForm%3ETypeaheadField(support_guided_search%3Dtrue%2C+resource_name%3DAdvancedTypeaheadResource%2C+tags%3Dautocomplete%2C+class_name%3DbuttonOnRight%2C+prefetch_on_focus%3Dtrue%2C+support_advanced_typeahead%3Dnull%2C+hide_tokens_on_focus%3Dundefined%2C+search_on_focus%3Dtrue%2C+placeholder%3DSearch%2C+show_remove_all%3Dtrue%2C+enable_recent_queries%3Dtrue%2C+name%3Dq%2C+view_type%3Dguided%2C+value%3D%22%22%2C+input_log_element_type%3D227%2C+populate_on_result_highlight%3Dtrue%2C+search_delay%3D0%2C+is_multiobject_search%3Dtrue%2C+type%3Dtokenized%2C+enable_overlay%3Dtrue)&_=1454779874891' 
EOF 
) 
## Load further images. 
declare PINCMD3=$(cat << EOF 
curl -H 'X-APP-VERSION: ea7a93a' -D h3.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'X-Pinterest-AppState: active' -H 'X-NEW-APP: 1' -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.pinterest.com' -H 'Connection: keep-alive' 'https://www.pinterest.com/resource/SearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Fq%3DSEARCHTOKEN1%2520SEARCHTOKEN2%26rs%3Dtyped%260%3DSEARCHTOKEN1%257Ctyped%261%3DSEARCHTOKEN2%257Ctyped&data=%7B%22options%22%3A%7B%22layout%22%3Anull%2C%22places%22%3Afalse%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%2C%22scope%22%3A%22pins%22%2C%22bookmarks%22%3A%5B%22_NEW_BOOK_MARK_%22%5D%7D%2C%22context%22%3A%7B%7D%7D&module_path=App%3EHeader%3ESearchForm%3ETypeaheadField(support_guided_search%3Dtrue%2C+resource_name%3DAdvancedTypeaheadResource%2C+tags%3Dautocomplete%2C+class_name%3DbuttonOnRight%2C+prefetch_on_focus%3Dtrue%2C+support_advanced_typeahead%3Dnull%2C+hide_tokens_on_focus%3Dundefined%2C+search_on_focus%3Dtrue%2C+placeholder%3DSearch%2C+show_remove_all%3Dtrue%2C+enable_recent_queries%3Dtrue%2C+name%3Dq%2C+view_type%3Dguided%2C+value%3D%22%22%2C+input_log_element_type%3D227%2C+populate_on_result_highlight%3Dtrue%2C+search_delay%3D0%2C+is_multiobject_search%3Dtrue%2C+type%3Dtokenized%2C+enable_overlay%3Dtrue)&_=1454779874911' 
EOF 
) 
## Exactly 2 search terms in a single string are expected, you can hack it up if 
## you want something else. 
declare SEARCHTOKEN1=$(echo "${1}" | cut -d " " -f1) 
declare SEARCHTOKEN2=$(echo "${1}" | cut -d " " -f2) 

PINCMD3=$(sed "s/SEARCHTOKEN1/${SEARCHTOKEN1}/g" <<< "${PINCMD3}") 
PINCMD3=$(sed "s/SEARCHTOKEN2/${SEARCHTOKEN2}/g" <<< "${PINCMD3}") 
PINCMD2=$(sed "s/SEARCHTOKEN1/${SEARCHTOKEN1}/g" <<< "${PINCMD2}") 
PINCMD2=$(sed "s/SEARCHTOKEN2/${SEARCHTOKEN2}/g" <<< "${PINCMD2}") 

function lspinimgs() { grep -o "\"url\": \"http[s]*://[^\"]*.pinimg.com[^\"]*.jpg\"" "${1}" | cut -d " " -f2 | tr -d "\""; } 
function mkpinorig() { sed "s#\(^http.*\)\(com/\)\([^/]*\)\(/.*jpg\$\)#\1\2originals\4#g" "${1}" > "${2}"; }  
function getpinbm() { grep -o "bookmarks\": [^ ]* " "${1}" | sed "s/^book.*\[\"//g;s/\"\].*\$//g" | sort | uniq | grep -v "-end-"; } 
function changepinbm() { PINCMD3=$(sed "s/\(^.*\)\(bookmarks%22%3A%5B%22\)\(.*\)\(%22%5D.*\$\)/\1\2${1}\4/g" <<< "${PINCMD3}"); } 
function cleanup() { rm ret*html c1.txt "${TMP_IMGS}" h{1..3}.txt "${ORIG_IMGS}"; } 

function main() { 
eval "${PINCMD1}" 
eval "${PINCMD2}" 
for ((i=3,lasti=2; i<10000; i++,lasti++)); do 
pinbm=$(getpinbm "ret${lasti}.html") 
[[ -z "${pinbm}" ]] && break 
changepinbm "${pinbm}" 
eval "${PINCMD3}" > "ret${i}.html" 
done 
for a in *.html; do lspinimgs "${a}" >> "${TMP_IMGS}"; done 
mkpinorig "${TMP_IMGS}" "${ORIG_IMGS}" 
IFS=$(echo -en "\n\b") && for a in $(sort "${ORIG_IMGS}" | uniq); do 
wget --tries=3 -E -e robots=off -nc --random-wait --content-disposition --no-check-certificate -p --restrict-file-names=windows,lowercase,ascii --header "${UA_HEADER}" -nd "$a" 
done 
cleanup 
} 

main 
exit 0