2017-08-03 64 views
0

以下是我试图将其引入Golang的一些XML文件。实际的XML文件超过500 MB。尝试解组嵌套的XML时抓取所有字段

<artists> 
    <artist> 
     <id>1</id> 
     <name>The Persuader</name> 
     <realname>Jesper Dahlbäck</realname> 
     <profile /> 
    </artist> 
    <artist> 
     <id>22</id> 
     <name>DATacide</name> 
     <profile>Datacide began recording together in 1993, after Tetsu Inoue met Uwe Schmidt while vacationing near Frankfurt. 
     </profile> 
     <members> 
      <id>25</id> 
      <name>Tetsu Inoue</name> 
      <id>519207</id> 
      <name>Uwe Schmidt</name> 
     </members> 
    </artist> 
</artists> 

以下是Go代码。我想要获取MEMBERS部分中的所有ID字段,但我的代码只抓取可能没有,一个或多个ID的最后一个ID字段。我如何将MEMBERS部分中的所有ID都抓到MEMBERS数组中?

package main 

import (
    "encoding/xml" 
    "fmt" 
    "io/ioutil" 
    "os" 
) 

type Artists struct { 
    XMLName xml.Name `xml:"artists"` 
    Artist []Artist `xml:"artist"` 
} 

type Artist struct { 
    XMLName xml.Name `xml:"artist"` 
    ArtistID uint32 `xml:" id,omitempty"` 
    ArtistName string `xml:" name,omitempty"` 
    Profile string `xml:" profile,omitempty"` 
    RealName string `xml:" realname,omitempty"` 
    Members MembersID `xml:"members,omitempty"` 
} 

type MembersID struct { 
    MemberID uint32 `xml:"id,omitempty"` 
} 

func main() { 

    xmlFile, err := os.Open("short_artists.xml") 
    if err != nil { 
     fmt.Println(err) 
    } 

    fmt.Println("Successfully opened artists file") 
    defer xmlFile.Close() 

    byteValue, _ := ioutil.ReadAll(xmlFile) 
    var artists Artists 
    xml.Unmarshal(byteValue, &artists) 

    for i := 0; i < len(artists.Artist); i++ { 
     fmt.Println("ArtistID: " + fmt.Sprint(artists.Artist[i].ArtistID)) 
     fmt.Println("Name: " + artists.Artist[i].ArtistName) 
     fmt.Println("Real Name: " + artists.Artist[i].RealName) 
     fmt.Println("Profile: " + artists.Artist[i].Profile) 
     fmt.Println("") 
     fmt.Printf("%v\n",artists.Artist[i].Members) 
     fmt.Println("") 
    } 
} 

我所有的Google和DuckDuckGo搜索都是紫色的。感谢您的帮助。

+0

jeevatkm有正确的解决方案,但我想在另一注折腾 - 如果你”重新读取500MB XML文件时,您可能会考虑使用['xml.Decoder'](https://golang.org/pkg/encoding/xml/#Decoder)进行流式解码,而不是将整个500MB文件读入内存,然后解码它。 – Adrian

回答

1

问题是MembersID结构定义。你必须使用切片。

type MembersID struct { 
    MemberID []uint32 `xml:"id,omitempty"` 
} 

播放链接:https://play.golang.org/p/h4qTmSQoRg

输出:

ArtistID: 1 
Name: The Persuader 
Real Name: Jesper Dahlbäck 
Profile: 

Members: [] 

ArtistID: 22 
Name: DATacide 
Real Name: 
Profile: Datacide began recording together in 1993, after Tetsu Inoue met Uwe Schmidt while vacationing near Frankfurt. 


Members: [25 519207] 

奖金提示:

选择性读取XML路径值,如果需要的话。例如获取XML路径的所有ID artist>members>id

type MemberID struct { 
    IDs []uint32 `xml:"artist>members>id"` 
} 

播放链接:https://play.golang.org/p/sj7XPisgl7

输出:

[25 519207] 
+0

我知道我不得不错过那样简单的事情。谢谢你的答案。 – ericbrow

+0

@ericbrow不客气。你可以接受答案。 – jeevatkm