2014-10-29 22 views
0

我有一个vba模块用于提取页面中的所有链接。但我想忽略某些标签中的所有链接,例如<header><footer>(及其所有子标签)。任何人都可以告诉我这是怎么做到的?当使用VBA通过id获取元素时,忽略某些标记中的元素

Sub Fetch_click() 

Dim LinkArr As Variant 

Set IE = CreateObject("InternetExplorer.Application") 
IE.Visible = True 
IE.Navigate Cells(1, 1).Text 
While IE.Busy 
DoEvents 
Wend 

Dim i As Integer 
i = 3 

Set LinkArr = IE.Document.getElementsByTagName("a") 
For Each LinkObj In LinkArr 
Cells(i, 1).Value = LinkObj.href 
i = i + 1 
Next 
End Sub 

谢谢

+0

这是未经测试,但在你的循环,你可以检查与像''每个A'的父标签。 ParentNode.NodeName =“header”'? – BobbitWormJoe 2014-10-29 08:22:07

+0

我可以,但有很多嵌套 – aadithyapk 2014-10-31 11:51:26

回答

2

我宁愿使用对象从Microsoft HTML对象库Microsoft Internet控制库(添加引用到两个!),例如

Sub StartTest() 
Dim Browser As SHDocVw.InternetExplorer 
Dim HTMLDoc As MSHTML.HTMLDocument 

    ' start browser 
    Set Browser = New SHDocVw.InternetExplorer 
    Browser.Visible = True 
    Browser.navigate "www.dauda.at" 
    Set HTMLDoc = Browser.document 

Dim ECol As MSHTML.IHTMLElementCollection 
Dim IFld As MSHTML.IHTMLElement 

    ' search all <a> tags 
    Set ECol = HTMLDoc.getElementsByTagName("a") 
    For Each IFld In ECol 

     ' etc ... 

    Next IFld 

    ' clean up 
    Set IFld = Nothing 
    Set ECol = Nothing 
    Set HTMLDoc = Nothing 
    Browser.Quit 
    Set Browser = Nothing 
End Sub 

检查您的<a>标签坐,可作为检查IFld.ParentNode.nodeName得到封闭父的标签一样简单。

如果不清楚如何深度嵌套的<a>是,你可以使用一个递归函数的最高追问下一个更高的父一路文档根目录("#document")或含有"HTML",例如

Function BadParentRec(TestFld As MSHTML.IHTMLElement) As Boolean 
Dim MyTag As String, MyTempResult As Boolean 

    BadParentRec = False 
    MyTag = TestFld.ParentNode.nodeName 
    ' Debug.Print MyTag 

    If MyTag = "#document" Then 
     MyTempResult = False        ' lowest level is good 
    ElseIf MyTag = "XXX" Then        ' your own criteria for bad tags go here 
     MyTempResult = True         ' send "bad" back up the recursion chain 
    Else 
     MyTempResult = BadParentRec(TestFld.parentElement) ' next level down 
    End If 

    BadParentRec = MyTempResult 

End Function 

...所以For Each循环内部,你会说

If Not BadParentRec(IFld) Then 
     Debug.Print Ifld.href    ' check here for href = "" 
    End If