2017-04-22 64 views
1

我试着去从一个页面中的联系方式阻碍网页解析的电话号码,但是当我运行我的脚本只抓住每个类别的第一部分,而忽略,因为有些BR标签的其余部分,如从联系人详细信息类别中,它只抓取名称而不是电话号码或传真。希望有人会给我任何想法,我怎么能得到那个?以下是我试过:不能从<br>标签

Sub RestData() 
Dim http As New MSXML2.XMLHTTP60 
Dim html As New HTMLDocument 
Dim ele As Object, post As Object 

With CreateObject("MSXML2.serverXMLHTTP") 
    .Open "GET", "http://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736", False 
    .send 
    html.body.innerHTML = .responseText 
End With 
Set ele = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p") 
    For Each post In ele 
     x = x + 1 
     Cells(x, 1) = post.innerText 
    Next post 

Set html = Nothing: Set ele = Nothing: Set docs = Nothing 
End Sub 

HTML元素:

<p>Company Name: Vaucraft Braford Stud<br>Phone: +61 7 4942 4859<br>Fax: +61 7 4942 0618<br>Email: <a href="mailto:[email protected]">[email protected]</a><br>Web: <a target="_blank" href="http://www.vaucraftbrafords.com.au">http://www.vaucraftbrafords.com.au</a></p> 

回答

1

你可以试试这样的事情...

Sub RestData() 
Dim http As New MSXML2.XMLHTTP60 
Dim html As New HTMLDocument 
Dim ele As Object, post As Object 
Dim TypeDetails() As String 
Dim TypeDetail() As String 
Dim i As Long, r As Long 
With CreateObject("MSXML2.serverXMLHTTP") 
    .Open "GET", "http://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736", False 
    .send 
    html.body.innerHTML = .responseText 
End With 
Set ele = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(2) 
r = 2 
TypeDetails() = Split(ele.innerText, Chr(10)) 

For i = 0 To UBound(TypeDetails) 
    TypeDetail() = Split(TypeDetails(i), ":") 
    Cells(r, 1) = VBA.Trim(TypeDetail(0)) 
    Cells(r, 2) = VBA.Trim(TypeDetail(1)) 
    r = r + 1 
Next i 

Set html = Nothing: Set ele = Nothing: Set docs = Nothing 
End Sub 
+0

哦,我的上帝,你的宝石一个男子。谢谢先生,这样一个强大而美妙的解决方案。这对我来说很新,我的意思是你在这里使用的风格。再次感谢。 – SIM

+0

不客气!很高兴它的工作。谢谢你的称赞。 :) – sktneer

+1

@ SMth80有两点需要注意:您可以拨打'createDocumentFromUrl()'直接从获取的URL的'HTMLDocument'([见这个问题(http://stackoverflow.com/questions/9995257)),摆脱所有的MSXML2.serverXmlHttp的东西交换。您可以使用'.querySelectorAll(“。contact-details .block .dark p”)'来简化DOM遍历。 – Tomalak