2011-11-26 41 views
4

我正尝试使用XML包从TCX文件导入GPS运行数据到R中。这里是我有一个小数据样本(只有3个航迹点,而不是〜900)使用XML包将TCX导入到R中

<?xml version="1.0" encoding="UTF-8"?> 
<TrainingCenterDatabase xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"> 
<Activities> 
    <Activity Sport="Running"> 
     <Id>2011-10-30T16:05:48Z</Id> 
     <Lap StartTime="2011-10-30T16:05:48Z"> 
      <TotalTimeSeconds>3855.99</TotalTimeSeconds> 
      <DistanceMeters>12498.8115</DistanceMeters> 
      <MaximumSpeed>4.45662498</MaximumSpeed> 
      <Calories>1011</Calories> 
      <Intensity>Active</Intensity> 
      <TriggerMethod>Manual</TriggerMethod> 
      <Track> 
       <Trackpoint> 
        <Time>2011-10-30T16:05:48Z</Time> 
        <Position> 
         <LatitudeDegrees>52.33613318</LatitudeDegrees> 
         <LongitudeDegrees>-1.58814317</LongitudeDegrees> 
        </Position> 
        <AltitudeMeters>77.5234375</AltitudeMeters> 
        <DistanceMeters>0.00000000</DistanceMeters> 
       </Trackpoint> 
       <Trackpoint> 
        <Time>2011-10-30T16:05:49Z</Time> 
        <Position> 
         <LatitudeDegrees>52.33614810</LatitudeDegrees> 
         <LongitudeDegrees>-1.58814283</LongitudeDegrees> 
        </Position> 
        <AltitudeMeters>77.5234375</AltitudeMeters> 
        <DistanceMeters>1.77584004</DistanceMeters> 
       </Trackpoint> 
       <Trackpoint> 
        <Time>2011-10-30T16:05:54Z</Time> 
        <Position> 
         <LatitudeDegrees>52.33627098</LatitudeDegrees> 
         <LongitudeDegrees>-1.58818323</LongitudeDegrees> 
        </Position> 
        <AltitudeMeters>76.0814209</AltitudeMeters> 
        <DistanceMeters>15.7694969</DistanceMeters> 
       </Trackpoint> 
      </Track> 
     </Lap> 
    </Activity> 
</Activities> 
</TrainingCenterDatabase> 

而且我尝试使用

doc = xmlParse("filetest.tcx") 
xmlToDataFrame(nodes = getNodeSet(doc, "//Trackpoint")) 

然而,这无法读取航迹和结果是一个空的数据框。但是我发现,如果我在文件开始从TrainingCenterDatabase标签去除

xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" 

,这种进口正常运行如预期。 IE浏览器。使用下面的数据:

<?xml version="1.0" encoding="UTF-8"?> 
<TrainingCenterDatabase> 
    <Activities> 
     <Activity Sport="Running"> 
      <Id>2011-10-30T16:05:48Z</Id> 
      <Lap StartTime="2011-10-30T16:05:48Z"> 
       <TotalTimeSeconds>3855.99</TotalTimeSeconds> 
       <DistanceMeters>12498.8115</DistanceMeters> 
       <MaximumSpeed>4.45662498</MaximumSpeed> 
       <Calories>1011</Calories> 
       <Intensity>Active</Intensity> 
       <TriggerMethod>Manual</TriggerMethod> 
       <Track> 
        <Trackpoint> 
         <Time>2011-10-30T16:05:48Z</Time> 
         <Position> 
          <LatitudeDegrees>52.33613318</LatitudeDegrees> 
          <LongitudeDegrees>-1.58814317</LongitudeDegrees> 
         </Position> 
         <AltitudeMeters>77.5234375</AltitudeMeters> 
         <DistanceMeters>0.00000000</DistanceMeters> 
        </Trackpoint> 
        <Trackpoint> 
         <Time>2011-10-30T16:05:49Z</Time> 
         <Position> 
          <LatitudeDegrees>52.33614810</LatitudeDegrees> 
          <LongitudeDegrees>-1.58814283</LongitudeDegrees> 
         </Position> 
         <AltitudeMeters>77.5234375</AltitudeMeters> 
         <DistanceMeters>1.77584004</DistanceMeters> 
        </Trackpoint> 
        <Trackpoint> 
         <Time>2011-10-30T16:05:54Z</Time> 
         <Position> 
          <LatitudeDegrees>52.33627098</LatitudeDegrees> 
          <LongitudeDegrees>-1.58818323</LongitudeDegrees> 
         </Position> 
         <AltitudeMeters>76.0814209</AltitudeMeters> 
         <DistanceMeters>15.7694969</DistanceMeters> 
        </Trackpoint> 
       </Track> 
      </Lap> 
     </Activity> 
    </Activities> 
</TrainingCenterDatabase> 

而且我得到我想要的数据框:(除了不被分裂成纬度和多头头寸,但是我希望我应该能够对付,除非有人可以建议一个简单的方法来做到这一点直接使用XPath?)

> xmlToDataFrame(nodes = getNodeSet(doc, "//Trackpoint")) 
        Time    Position AltitudeMeters DistanceMeters 
1 2011-10-30T16:05:48Z 52.33613318-1.58814317  77.5234375  0.00000000 
2 2011-10-30T16:05:49Z 52.33614810-1.58814283  77.5234375  1.77584004 
3 2011-10-30T16:05:54Z 52.33627098-1.58818323  76.0814209  15.7694969 

很显然,我不希望有手动从我希望导入的文件中删除此。有没有什么我做错了(用XPath也许?),这是阻止这个工作,或者是否有解决从XML数据中删除部分?

非常感谢

回答

9

这是一个命名空间问题。只是这样做

xmlToDataFrame(nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")) 

要通过经纬度直接获得位置裂开,你可以做以下

nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns") 
mydf <- plyr::ldply(nodes, as.data.frame(xmlToList)) 
setNames(mydf, c('time', 'lat', 'long', 'alt', 'distance')) 

现在给

    time   lat  long  alt distance 
1 2011-10-30T16:05:48Z 52.33613318 -1.58814317 77.5234375 0.00000000 
2 2011-10-30T16:05:49Z 52.33614810 -1.58814283 77.5234375 1.77584004 
3 2011-10-30T16:05:54Z 52.33627098 -1.58818323 76.0814209 15.7694969 
+0

作品,谢谢! –