2016-11-25 122 views
4

比方说,我有一个页面,如下所示,保存在C:\ TEMP \ html_page.html:读取和操作HTML与Excel VBA

<html> 
    <head> 
     <link rel="stylesheet" href="styles.css"> 
    </head> 
    <body> 
     <div id="xxx1"> 
     <img src="test.png"> 
     </div> 
    </body> 
</html> 

我想以编程方式调整IMG src属性,基于Excel数据& VBA。基本上可以通过Xpath找到div,并调整其中包含的(单个)img标记。

我发现了一个使用VBA通过XML库here操作XML的示例,但我一直在讨论如何使用HTML对象库进行此项工作;找不到任何示例和/或文档。

Dim XDoc As Object, root As Object 

Set XDoc = CreateObject("MSXML2.DOMDocument") 
XDoc.async = False: XDoc.validateOnParse = False 

If XDoc.Load(html_path) Then 
    Debug.Print "Document loaded" 
Else 
    Dim strErrText As String 
    Dim xPE As MSXML2.IXMLDOMParseError 
    ' Obtain the ParseError object 
    Set xPE = XDoc.parseError 
    With xPE 
     strErrText = "Your XML Document failed to load" & _ 
     "due the following error." & vbCrLf & _ 
     "Error #: " & .ErrorCode & ": " & xPE.reason & _ 
     "Line #: " & .Line & vbCrLf & _ 
     "Line Position: " & .linepos & vbCrLf & _ 
     "Position In File: " & .filepos & vbCrLf & _ 
     "Source Text: " & .srcText & vbCrLf & _ 
     "Document URL: " & .URL 
    End With 
    MsgBox strErrText, vbExclamation 

所有我想要做的是:

'... 
Set outer_div = XDoc.SelectFirstNode("//div[id='xxx1'") 
... edit the img attribute 

但我不能加载HTML页面,因为它不是正确的XML(img标签未闭)。

任何帮助,非常感谢。哦,我不能使用其他语言,比如Python,无赖。

回答

3

这不是你想要的,但它可能已经足够接近了。而不是使用XML库,使用HTML库:

Sub changeImg() 

    Dim dom As Object 
    Dim img As Object 
    Dim src As String 

    Set dom = CreateObject("htmlFile") 

    Open "C:\temp\test.html" For Input As #1 
     src = Input$(LOF(1), 1) 
    Close #1 

    dom.body.innerHTML = src 

    Set img = dom.getelementsbytagname("img")(0) 

    img.src = "..." 

    Open "C:\temp\test.html" For Output As #1 
     Print #1, dom.DocumentElement.outerHTML 
    Close #1 


End Sub 

的问题是,生成的文件会添加Head节点和标记名称将是大写的。如果你能忍受这一点,解决方案将为你工作。

另外,如果您想更深入地做一些事情,选择更好的选择器会考虑早期绑定。暴露的HTML界面比界面不同的,当后期绑定,并支持更多的特性 - 你要添加一个引用到HTML Object Library

Sub changeImg() 

    Dim dom As HTMLDocument 
    Dim img As Object 
    Dim src As String 

    Set dom = CreateObject("htmlFile") 

    Open "C:\temp\test.html" For Input As #1 
     src = Input$(LOF(1), 1) 
    Close #1 

    dom.body.innerHTML = src 

    Set img = dom.getelementsbytagname("img")(0) 

    img.src = "..." 

    Open "C:\temp\test.html" For Output As #1 
     Print #1, dom.DocumentElement.outerHTML 
    Close #1 


End Sub 
+0

非常感谢!似乎我几乎在那里:问题不是100%准确。我正在寻找适用于多行HTML文件的解决方案。我试图找到如何调整代码,但尚未成功。你介意加入这个答案吗? – MattV

+0

@MattV,抱歉,我一定错过了一些东西,为什么这不适用于多行文件?让我知道,我会更新 – SWa

0

为了这个目的,你可以使用doc.querySelector("div[id='xxx1'] img")。要更改src属性,请使用img.setAttribute "src", "new.png"。 HTH

Option Explicit 

' Add reference to Microsoft Internet Controls (SHDocVw) 
' Add reference to Microsoft HTML Object Library 

Sub Demo() 
    Dim ie As SHDocVw.InternetExplorer 
    Dim doc As MSHTML.HTMLDocument 
    Dim url As String 

    url = "file:///C:/Temp/StackOverflow/html/html_page.html" 
    Set ie = New SHDocVw.InternetExplorer 
    ie.Visible = True 
    ie.navigate url 
    While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend 
    Set doc = ie.document 

    Dim img As HTMLImg 
    Set img = doc.querySelector("div[id='xxx1'] img") 
    If Not img Is Nothing Then 
     img.setAttribute "src", "new.png" 
    End If 
    ie.Quit 
End Sub