2017-03-31 98 views
-1

我想使用美丽的页面来获取网页的HTML源。在使用美丽页获取页面源时遇到困难

import bs4 as bs 
import requests 
import urllib.request 
sourceUrl='https://www.pakwheels.com/forums/t/planing-a-trip-from-karachi-to-lahore-by-road-in-feb-2017/414115/2.html' 
source=urllib.request.urlopen(sourceUrl).read() 
soup=bs.BeautifulSoup(source,'html.parser') 
print(soup) 

我想要页面的HTML源代码。这就是我现在越来越:

'ps.store("siteSettings", {"title":"PakWheels Forums","contact_email":"[email protected]","contact_url":"https://www.pakwheels.com/main/contact_us","logo_url":"https://www.pakwheels.com/assets/logo.png","logo_small_url":"/images/d-logo-sketch-small.png","mobile_logo_url":"' 
+0

如果您需要原始来源,则不需要'BeautifulSoup'。 –

+0

我需要HTML源码不是原始码 –

+0

https://docs.python.org/3/howto/urllib2.html –

回答

0

看一看这段代码:

你需要正确
from urllib import request 
from bs4 import BeautifulSoup 


url_1 = "http://www.google.com" 
page = request.urlopen(url_1) 
soup = BeautifulSoup(page) 
print(soup.prettify()) 

进口一切。阅读this