我在将url解析为字符串时遇到了一些麻烦。我需要检查网址是否属于白名单中的域名,但检查失败。我想知道原因,如果我的代码缺乏。检查网址(字符串)
whitelist = []
whitelist_file = open(whitelist_file, 'r')
url = whitelist_file.readline()
for url in whitelist_file:
whitelist = whitelist + [str(url)]
whitelist_file.close()
test_file = open(test_file, 'r')
url_to_check = test_file.readlines()
for url in url_to_check:
for word in whitelist:
print(str(word), str(url), word in url)
print("-----")
这是上面打印输出(所以你有样品的选中字符串)。你可以看到它失败a2a.eu
a2a.eu
https://www.medgadget.com/2017/10/adenosine-a2a-receptor-antagonist-pipeline-insights-2017.html
False
-----
ansa.it
https://www.medgadget.com/2017/10/adenosine-a2a-receptor-antagonist-pipeline-insights-2017.html
False
-----
atlantia.it
https://www.medgadget.com/2017/10/adenosine-a2a-receptor-antagonist-pipeline-insights-2017.html
False
-----
azimut-group.com
https://www.medgadget.com/2017/10/adenosine-a2a-receptor-antagonist-pipeline-insights-2017.html
False
-----
a2a.eu
https://www.a2a.eu/en/2017-financial-calendar-a2a-spa
False
-----
ansa.it
https://www.a2a.eu/en/2017-financial-calendar-a2a-spa
False
-----
atlantia.it
https://www.a2a.eu/en/2017-financial-calendar-a2a-spa
False
-----
azimut-group.com
https://www.a2a.eu/en/2017-financial-calendar-a2a-spa
False
-----
a2a.eu
http://www.a2a.eu/en
False
-----
ansa.it
http://www.a2a.eu/en
False
-----
atlantia.it
http://www.a2a.eu/en
False
-----
azimut-group.com
http://www.a2a.eu/en
False
感谢
您显示的代码似乎不会产生您的问题中的输出。 –
您应该使用urllib.parse模块将域名从网址中取出。然后,您可以根据您的“白人”列表检查每个域名。 –
检查是持续这一个:打印(...,在URL中的文字) – Fulviooo