0
一直在这工作了一段时间,但也许我只是寻找错误的东西来得到我需要的答案。改变HTML和保存html文档
我有一本字典,其关键字是我想在网页中找到的特定字词。然后,我想突出显示这些单词并将生成的HTML保存到本地文件中。
编辑:后来发现我喜欢自己执行代码的人。这link包括单词词典和我用来测试我的代码的页面的HTML,因为它应该有我正在扫描的任何页面的最匹配。或者,您可以使用实际的website。该链接将取代代码中的rl [0]。
try:
#rl[0] refers to a specific url being pulled from a list in another file.
req = urllib.request.Request(rl[0],None,headers)
opener = urllib.request.build_opener(proxy_support, urllib.request.HTTPCookieProcessor(cj))
resp = opener.open(req)
soup = BeautifulSoup(resp.read(),'html.parser')
resp.close
except urllib.error.URLError:
print("URL error when opening "+rl[0])
except urllib.error.HTTPError:
print("HTTP error when opening "+rl[0])
except http.client.HTTPException as err:
print(err, "HTTP exception error when opening "+rl[0])
except socket.timeout:
print("connection timedout accessing "+rl[0])
soup = None
else:
for l in [wdict1,wdict2,wdict3,wdict4]:
for i in l:
foundvocab = soup.find_all(text=re.compile(i))
for term in foundvocab:
#c indicates the highlight color determined earlier in the script based on which dictionary the word came from.
#numb is a term i defined earlier to use as a reference to another document this script creates.
fixed = term.replace(i,'<mark background-color="'+c+'">'+i+'<sup>'+numb+'</sup></mark>')
term.replace_with(fixed)
print(soup, file=path/local.html)
我遇到的问题是,当汤打印它打印每个单词的整个段落它找到并不突出显示。或者我可以这样说:
foundvocab = soup.find_all(text=i)
并且生成的HTML文件是空白的。