Python - 使脚本循环，直到条件满足，并为每个循环使用不同的代理地址

我是noob的定义。我对python几乎一无所知，并且正在寻求帮助。我可以阅读足够的代码来改变变量以适应我的需求，但是当我做一些原始代码不需要的东西...我迷路了。Python - 使脚本循环，直到条件满足，并为每个循环使用不同的代理地址

所以这里是交易，我找到了一个craigslist（CL）标记脚本，最初搜索所有CL网站和标记的帖子，其中包含一个特定的关键字（它被写为标记所有提到scienceology的帖子）。

我改变它只在我的一般区域（15个网站而不是437）搜索CL网站，但它仍然会查找已更改的特定关键字。我想自动标记持续垃圾邮件的人，并且很难排序，因为我在CL上做了很多业务，从邮件中排序。

我想让脚本执行循环，直到它不能再在每个循环之后找到满足标准更改代理服务器的帖子。并在剧本里面放置代理/ IP地址的地方

我期待着您的回复。

这里是改变的代码，我有：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 


import urllib 
from twill.commands import * # gives us go() 

areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 'mendocino', 'modesto', 'monterey', 'redding', 'reno', 'sacramento', 'siskiyou', 'stockton', 'yubasutter', 'reno'] 

def expunge(url, area): 
    page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
    page = page[page.index('<hr>'):].split('\n')[0] 
    page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
     num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
     spam = 'https://post.craigslist.org/flag?flagCode=15&amppostingID='+num # url for flagging as spam 
     go(spam) # flag it 


print 'Checking ' + str(len(areas)) + ' areas...' 

for area in ['http://' + a + '.craigslist.org/' for a in areas]: 
    ujam = area + 'search/?query=james+"916+821+0590"+&catAbb=hhh' 
    udre = area + 'search/?query="DRE+%23+01902542+"&catAbb=hhh' 
    try: 
     jam = urllib.urlopen(ujam).read() 
     dre = urllib.urlopen(udre).read() 
    except: 
     print 'tl;dr error for ' + area 

    if 'Found: ' in jam: 
     print 'Found results for "James 916 821 0590" in ' + area 
     expunge(ujam, area) 
     print 'All "James 916 821 0590" listings marked as spam for area' 

    if 'Found: ' in dre: 
     print 'Found results for "DRE # 01902542" in ' + area 
     expunge(udre, area) 
     print 'All "DRE # 01902542" listings marked as spam for area'

来源

2013-02-19 Timothy Core

如果你只使用'go'，只导入'go'：'从twill.commands导入go' – askewchan 2013-02-19 21:17:19

导入错误：没有名为模块去 – 2013-02-19 22:18:10

奇怪：HTTP：//斜纹.idyll.org/python-api.html说：'从twill.commands进口去' – askewchan 2013-02-19 22:24:07

您可以创建一个恒定的循环是这样的：

while True: 
    if condition : 
     break

Itertools有技巧的去重复http://docs.python.org/2/library/itertools.html

特别是屈指可数，退房itertools.cycle

（这些都是指向正确方向的指针。你可以制定一个解决方案，其他，甚至两个）

来源

2013-02-19 21:18:51

对不起，我不明白它..我试图添加repeat（）进入代码，但我不断得到Traceback（最近一次调用最后）：文件“/ home/quonundrum/Desktop/CL。py'，第43行，在 repeat（'spam，4'） NameError：name'repeat'is not defined >>> – 2013-02-19 21:57:40

'import itertools as it' then call'it.repeat（）' – askewchan 2013-02-19 22:27:31

I' （'go，4'），it.repeat（'go（spam），4'），it.repeat（'expunge'），it.repeat（'ujam'）..和一大堆其他人......这不是重复，但也没有给出任何错误 – 2013-02-19 22:54:55

我对你的代码做了一些改变。在我看来，函数expunge已经循环遍历页面中的所有结果，所以我不确定你需要做什么循环，但是有一个例子说明如何在结束时检查结果是否被找到，但没有循环可以打破。

不知道如何更改代理/ IP。

顺便说一句，你有'reno'两次。

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib 
from twill.commands import go 

areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 
     'mendocino', 'modesto', 'monterey', 'redding', 'reno', 
     'sacramento', 'siskiyou', 'stockton', 'yubasutter'] 
queries = ['james+"916+821+0590"','"DRE+%23+01902542"'] 

def expunge(url, area): 
    page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
    page = page[page.index('<hr>'):].split('\n')[0] 
    page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
     num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
     spam = 'https://post.craigslist.org/flag?flagCode=15&amppostingID='+num # url for flagging as spam 
     go(spam) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

for area in areas: 
    for query in queries: 
     qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
     try: 
      q = urllib.urlopen(qurl).read() 
     except: 
      print 'tl;dr error for {} in {}'.format(query, area) 
      break 

     if 'Found: ' in q: 
      print 'Found results for {} in {}'.format(query, area) 
      expunge(qurl, area) 
      print 'All {} listings marked as spam for area'.format(query) 
     elif 'Nothing found for that search' in q: 
      print 'No results for {} in {}'.format(query, area) 
      break 
     else: 
      break

来源

2013-02-19 23:31:43 askewchan

酷，看起来好多了。有没有办法让它继续运行，直到它没有得到任何结果？ – 2013-02-19 23:57:53

你意味着你期望结果页面在程序运行时改变？ – askewchan 2013-02-20 00:02:55

在shell中显示它发现/标记的东西，所以我想知道是否有脚本继续运行，直到没有更多结果对于被搜索的关键字（IE浏览器的所有结果）重新标记直到删除）。 – 2013-02-20 00:27:52

我做了一些改变...不知道他们工作得如何，但我没有得到任何错误。请让我知道，如果你发现任何错误/缺少的东西。 - 感谢

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib, urllib2 
from twill.commands import go 


proxy = urllib2.ProxyHandler({'https': '108.60.219.136:8080'}) 
opener = urllib2.build_opener(proxy) 
urllib2.install_opener(opener) 
proxy2 = urllib2.ProxyHandler({'https': '198.144.186.98:3128'}) 
opener2 = urllib2.build_opener(proxy2) 
urllib2.install_opener(opener2) 
proxy3 = urllib2.ProxyHandler({'https': '66.55.153.226:8080'}) 
opener3 = urllib2.build_opener(proxy3) 
urllib2.install_opener(opener3) 
proxy4 = urllib2.ProxyHandler({'https': '173.213.113.111:8080'}) 
opener4 = urllib2.build_opener(proxy4) 
urllib2.install_opener(opener4) 
proxy5 = urllib2.ProxyHandler({'https': '198.154.114.118:3128'}) 
opener5 = urllib2.build_opener(proxy5) 
urllib2.install_opener(opener5) 


    areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 
    'mendocino', 'modesto', 'monterey', 'redding', 'reno', 
    'sacramento', 'siskiyou', 'stockton', 'yubasutter'] 
queries = ['james+"916+821+0590"','"DRE+%23+01902542"'] 

    def expunge(url, area): 
page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
page = page[page.index('<hr>'):].split('\n')[0] 
page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
    num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
    spam = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=15&amppostingID='+num) 
    spam2 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=28&amppostingID='+num) 
    spam3 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=16&amppostingID='+num) 
    go(spam) # flag it 
    go(spam2) # flag it 
    go(spam3) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

    for area in areas: 
for query in queries: 
    qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
    try: 
     q = urllib.urlopen(qurl).read() 
    except: 
     print 'tl;dr error for {} in {}'.format(query, area) 
     break 

    if 'Found: ' in q: 
     print 'Found results for {} in {}'.format(query, area) 
     expunge(qurl, area) 
     print 'All {} listings marked as spam for {}'.format(query, area) 
     print '' 
     print '' 
    elif 'Nothing found for that search' in q: 
     print 'No results for {} in {}'.format(query, area) 
     print '' 
     print '' 
     break 
    else: 
     break

来源

2013-02-20 19:11:38

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib, urllib2 
from twill.commands import go 


proxy = urllib2.ProxyHandler({'https': '108.60.219.136:8080'}) 
opener = urllib2.build_opener(proxy) 
urllib2.install_opener(opener) 
proxy2 = urllib2.ProxyHandler({'https': '198.144.186.98:3128'}) 
opener2 = urllib2.build_opener(proxy2) 
urllib2.install_opener(opener2) 
proxy3 = urllib2.ProxyHandler({'https': '66.55.153.226:8080'}) 
opener3 = urllib2.build_opener(proxy3) 
urllib2.install_opener(opener3) 
proxy4 = urllib2.ProxyHandler({'https': '173.213.113.111:8080'}) 
opener4 = urllib2.build_opener(proxy4) 
urllib2.install_opener(opener4) 
proxy5 = urllib2.ProxyHandler({'https': '198.154.114.118:3128'}) 
opener5 = urllib2.build_opener(proxy5) 
urllib2.install_opener(opener5) 


areas = ['capecod'] 
queries = ['rent','rental','home','year','falmouth','lease','credit','tenant','apartment','bedroom','bed','bath'] 

    def expunge(url, area): 
page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
page = page[page.index('<hr>'):].split('\n')[0] 
page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
    num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
    spam = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=15&amppostingID='+num) 
    spam2 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=28&amppostingID='+num) 
    spam3 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=16&amppostingID='+num) 
    go(spam) # flag it 
    go(spam2) # flag it 
    go(spam3) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

    for area in areas: 
for query in queries: 
    qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
    try: 
     q = urllib.urlopen(qurl).read() 
    except: 
     print 'tl;dr error for {} in {}'.format(query, area) 
     break 

    if 'Found: ' in q: 
     print 'Found results for {} in {}'.format(query, area) 
     expunge(qurl, area) 
     print 'All {} listings marked as spam for {}'.format(query, area) 
     print '' 
     print '' 
    elif 'Nothing found for that search' in q: 
     print 'No results for {} in {}'.format(query, area) 
     print '' 
     print '' 
     break 
    else: 
     break

来源

2014-05-17 18:26:54 user3648249

Python - 使脚本循环，直到条件满足，并为每个循环使用不同的代理地址

回答

相关问题