-1
我试图从网站中的多个页面中提取一些数据,并使用Javascript生成内容。 所以我使用PyQt4和美丽的汤来循环页面并提取一些数据字段。使用PyQt4和美丽的汤来浏览网页
import sys
from bs4 import BeautifulSoup
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage
class Client(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self.on_page_load)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def on_page_load(self):
self.app.quit()
products_titles = []
urls= ['url1', 'url2', 'url3']
for url in urls:
print "Parsing URL: " + url + '\n'
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = BeautifulSoup(source, "html.parser")
print get_product_category(soup)
但是当我运行它击碎并给出了此错误:
QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
[1] 14809 segmentation fault python products.py
我不知道我是我做错了,请你知道什么事情帮助。
谢谢,它工作得很好,比我的解决方案更快! – melhirech