我试图让我的程序函数正确地将每个正则表达式的.findAll()方法送入HTML解析器,如下面的代码片段所示。问题是我得到Python- UnboundLocalError:在赋值之前引用的局部变量 - 正则表达式/ if else
UnboundLocalError: local variable referenced before assignment
为headingList和imageList,这取决于我如何更改我的代码。我认为这是因为if语句没有超越第一个if语句,因为它是真的。我尝试使用if heading and image and description and storyLink and date:
并创建一个for循环内的所有变量,但是当我运行该程序时什么也没有发生。我认为这是我的代码结构,或者它甚至可能是图像变量的正则表达式,可能会导致问题,但我不这么认为。任何帮助将不胜感激:)
编辑:HTML snippet being used to parse from regex
def extractNews():
selection = listbox.curselection()
if selection == (0,):
# Read the webpage:
response = urlopen("file:///E:/University/IFB104/InternetArchive/Archives/Sun,%20October%201st,%202017.html")
html = response.read()
#regex
heading = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
image = findall((r'<span data-omni-sm-delegate="(.*)">(\n|\r)\s+<a href="(.*)></a>(\n|\r)\s+</span>'), str(html)) #<span data-omni-sm-delegate="(.*)">(\n|\r)\s+<a href="(.*)></a>(\n|\r)\s+</span>
description = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
storyLink = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
date = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
if heading:
headingList = []
for link, title in heading:
headingVariable = "%s" % (title)
headingList.append(headingVariable)
if image:
imageList = []
for link, title in image:
imageVariable = "%s" % (title)
imageList.append(imageVariable)
if description:
descriptionList = []
for link, title in description:
descriptionVariable = "%s" % (title)
descriptionList.append(descriptionVariable)
if storyLink:
storyLinkList = []
for link, title in storyLink:
storyLinkVariable = "%s" % (title)
storyLinkList.append(storyLinkVariable)
if date:
dateList = []
for link, title in date:
dateVariable = "%s" % (title)
dateList.append(dateVariable)
html_str = ('<!DOCTYPE html>\n'
'<html>\n'
'<head>\n'
'<title>TechCrunch Archive - Sun, October 1st, 2017</title>\n'
'</head>\n'
'<body>\n'
'<h1>' + headingList[0] + '</h1>\n'
'<a href="'+ imageList[0]+'></a>\n'
'<p>description goes here</p>\n'
'<p>full story link goes here</p>\n'
'<p>date goes here</p>\n'
'<br><br>\n'
'<h1>' + headingList[1] + '</h1>\n'
'image goes here\n'
'<p>description goes here</p>\n'
'<p>full story link goes here</p>\n'
'<p>date goes here</p>\n'
'<br><br>\n'
'<h1>' + headingList[2] + '</h1>\n'
'image goes here\n'
'<p>description goes here</p>\n'
'<p>full story link goes here</p>\n'
'<p>date goes here</p>\n'
'<br><br>\n'
'</body>\n'
'</html>)')
Html_file = open("ExtractedContent/Sun, October 1st, 2017 - Extracted.html", "w")
Html_file.write(html_str)
Html_file.close()
你检查过'heading'和'image'的值是你期望的,对吧? –
如果我在图像标题中使用相同的正则表达式,问题不会发生,并且导入正常。但是,当我尝试使用正则表达式表达式时,它会返回错误。我认为这可能是这种情况,但我认为一个错误的正则表达式会导致问题。我预料它只会导入错误的内容。你知道这是为什么吗? – mattappdev
你为'heading'和'image'得到了什么值,它们是否与你所期望的不同? –