0
我想验证我的抓取工具从我的.txt文件中存储的链接获取的链接。在我的抓取工具从网络检索链接后,它会将附加('a')到我的.txt文件中。但是,如果链接已经存在于我的.txt文件中,我想用('w')附加它。任何想法,我怎么能做到这一点?Python - 验证,写入并附加.txt文件
def spider(targetname, DOMAIN, g_data):
for item in g_data:
try:
name = item.find_all("strong", {"class": "fullname show-popup-with-id "})[0].text
username = item.find_all("span", {"class": "username u-dir"})[0].text
post = item.find_all("p", {"class": "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text"})[0].text
replies = item.find_all("span", {"class": "u-hiddenVisually"})[3].text
retweets = item.find_all("span", {"class": "u-hiddenVisually"})[4].text
likes = item.find_all("span", {"class": "u-hiddenVisually"})[5].text
retweetby = item.find_all("a", {"href": "/"+targetname})[0].text
datas = item.find_all('a', {'class':'tweet-timestamp js-permalink js-nav js-tooltip'})
for data in datas:
link = DOMAIN + data['href']
date = data['title']
append_to_file(crawledfile, name, username, post, link, replies, retweets, likes, retweetby, date)
except:
pass
`def append_to_file(path, name, username, post, link, replies, retweets, likes, retweetby, date):
with open(path, 'a') as file:
try:
file.write("Name: "+ name + '\n')
except:
print("Name: --Currently unavailable--" + '\n')
try:
file.write("Username: "+ username + '\n')
except:
print("Username: --Currently unavailable--" + '\n')
try:
file.write("Post: "+ post + '\n')
except:
print("Post: --Currently unavailable--" + '\n')
try:
file.write("post's link: "+ link.strip() + '\n')
except:
print("post's link: --Currently unavailable--" + '\n')
try:
file.write("Replies: "+ replies.strip() + '\n')
except:
print("Replies: --Currently unavailable--" + '\n')
try:
file.write("Retweet: "+ retweets.strip() + '\n')
except:
print("Retweet: --Currently unavailable--" + '\n')
try:
file.write("Likes: "+ likes.strip() + '\n')
except:
print("Likes: --Currently unavailable--" + '\n')
try:
if(username != "@" + targetname):
file.write("Retweeted By: " + retweetby.strip() + '\n')
except:
file.write("Retweeted By: --Currently unavailable--" + '\n')
try:
file.write("Date: " + date + '\n')
except:
file.write("Date: --Currently unavailable--" + '\n')
file.write("" + '\n')`
Name: Donald J. Trump Username: @realDonaldTrump Post: I look forward to paying my respects to our brave men and women on this Memorial Day at Arlington National Cemetery later this morning. post's link: https://twitter.com/realDonaldTrump/status/869170615881793536 Replies: 14,333 replies Retweet: 13,492 retweets Likes: 74,645 likes Date: 5:36 AM - 29 May 2017
Name: Donald J. Trump Username: @realDonaldTrump Post: Today we remember the men and women who made the ultimate sacrifice in serving. Thank you, God bless your families & God bless the USA! post's link: https://twitter.com/realDonaldTrump/status/869170351049240576 Replies: 8,827 replies Retweet: 33,541 retweets Likes: 123,112 likes Date: 5:35 AM - 29 May 2017
我建议您显示'append_to_file'方法,以便我可以提供更好的答案。但是,如果您只需检查检索到的链接是否存在于您的文本文件中,则可以执行如下操作: '如果链接处于打开状态('your_file.txt')中,则为: #link present 否则: #链接不存在' –
我真的很困惑。 “但是,如果链接已经存在于我的.txt文件中,我想更新('w')它”更新什么?你的意思是更新字段“名称”“,”用户名“,”发布“等...说有链接吗? – EyuelDK
@AdeelAhmad我发布了我的append_to_file方法 – NewbieCoder