2017-05-29 137 views
2

我想要声明一个节点数组(这不是问题),然后在数组的每个元素内部刮两个子节点的innerHTML - 以SE为例使用对象方法IE),假设我试图在主页上提取标题和问题摘要,则有一个节点数组(类名称:“问题摘要”)。从使用VBA的站点刮掉innerHTML

有那么两个孩子节点(瓦 - 类名称:“问题 - 超链接”和提取物 - 类名称:“摘录”)我正在使用的代码是如下:

Sub Scraper() 
Dim ie As Object 
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object 
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String 

Set ie = CreateObject("internetexplorer.application") 
sURL = "https://stackoverflow.com/questions/tagged/excel-formula" 

QuestionShell = "question-summary" 
QuestionTitle = "question-hyperlink" 
Question = "excerpt" 

With ie 
    .Visible = False 
    .Navigate sURL 
End With 

Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4) 

Set oQuestionShells = doc.getElementsByClassName(QuestionShell) 

For Each oElement In oQuestionShells 
    Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method" 
    Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method" 
    Debug.Print oQuestionTitle.innerHTML 
    Debug.Print oQuestion.innerHTML 
Next 

End Sub 

回答

2

getElementByClassName不是一种方法。

您只能使用getElementsByClassName(注意方法名称中的复数形式),它返回IHTMLElementCollection

使用Object代替IHTMLElementCollection是好的 - 但您仍然需要通过提供索引来访问集合中的特定元素。

我们假设每个oElement只有一个question-summary类的实例和question-hyperlink类的一个实例。然后,您可以仅使用getElementsByClassName并在末尾使用(0)来提取返回的数组的第一个元素。

所以,你的代码修正:

Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0) 
Set oQuestion = oElement.getElementsByClassName(Question)(0) 

全部工作的代码(有一些更新,即使用Option Explicit,等待页面加载):

Option Explicit 

Sub Scraper() 

    Dim ie As Object 
    Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object 
    Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String 

    Set ie = CreateObject("internetexplorer.application") 
    sURL = "https://stackoverflow.com/questions/tagged/excel-formula" 

    QuestionShell = "question-summary" 
    QuestionTitle = "question-hyperlink" 
    Question = "excerpt" 

    With ie 
     .Visible = True 
     .Navigate sURL 
     Do 
      DoEvents 
     Loop While .ReadyState < 4 Or .Busy 
    End With 

    Set doc = ie.Document 

    Set oQuestionShells = doc.getElementsByClassName(QuestionShell) 

    For Each oElement In oQuestionShells 
     'Debug.Print TypeName(oElement) 

     Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0) 
     Set oQuestion = oElement.getElementsByClassName(Question)(0) 

     Debug.Print oQuestionTitle.innerHTML 
     Debug.Print oQuestion.innerHTML 
    Next 

    ie.Quit 

End Sub 
+0

我是个白痴!谢谢 :) – Jeremy