2011-09-23 44 views
0

我有一个松散结构的XHTML数据,我需要将其转换为更好的结构化XML。一个棘手的XSLT转换

这里的例子:

<tbody> 
<tr> 
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td> 
</tr> 
<tr> 
    <td>Green</td> 
    <td>Round shaped</td> 
    <td>Tasty</td> 
</tr> 
<tr> 
    <td>Red</td> 
    <td>Round shaped</td> 
    <td>Bitter</td> 
</tr> 
<tr> 
    <td>Pink</td> 
    <td>Round shaped</td> 
    <td>Tasty</td> 
</tr> 
<tr> 
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td> 
</tr> 
<tr> 
    <td>Red</td> 
    <td>Heart shaped</td> 
    <td>Super tasty</td> 
</tr> 
<tr> 
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td> 
</tr> 
<tr> 
    <td>Yellow</td> 
    <td>Smile shaped</td> 
    <td>Fairly tasty</td> 
</tr> 
<tr> 
    <td>Brown</td> 
    <td>Smile shaped</td> 
    <td>Too sweet</td> 
</tr> 

我想实现以下结构:

<data> 
    <entry> 
     <type>Apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Green</color> 
     <shape>Round shaped</shape> 
     <taste>Tasty</taste> 
    </entry> 
    <entry> 
     <type>Apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Red</color> 
     <shape>Round shaped</shape> 
     <taste>Bitter</taste> 
    </entry> 
    <entry> 
     <type>Apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Pink</color> 
     <shape>Round shaped</shape> 
     <taste>Tasty</taste> 
    </entry> 
    <entry> 
     <type>Strawberries</type> 
     <country>USA</country> 
     <rank>Fifth Grade</rank> 
     <color>Red</color> 
     <shape>Heart shaped</shape> 
     <taste>Super</taste> 
    </entry> 
    <entry> 
     <type>Bananas</type> 
     <country>Congo</country> 
     <rank>Third Grade</rank> 
     <color>Yellow</color> 
     <shape>Smile shaped</shape> 
     <taste>Fairly tasty</taste> 
    </entry> 
    <entry> 
     <type>Bananas</type> 
     <country>Congo</country> 
     <rank>Third Grade</rank> 
     <color>Brown</color> 
     <shape>Smile shaped</shape> 
     <taste>Too sweet</taste> 
    </entry> 
</data> 

首先,我需要提取从TBODY/TR/TD水果型/ img [1]/@ src,其次来自的国家tbody/tr/td/img [2]/@ alt属性和fina lly从tbody/tr/td本身的等级。

接下来,我需要填充每个类别下的所有条目,同时包括这些值(如上所示)。

但是......正如你所看到的,我给出的数据结构非常松散。一个类别只是一个td,然后就是该类别中的所有项目。更糟糕的是,在我的数据集中,每个类别下的项目数量在1到100之间变化...

我试过几种方法,但似乎无法得到它。任何帮助是极大的赞赏。我知道XSLT 2.0引入了xsl:for-each-group,但我仅限于XSLT 1.0。

回答

3

在这种情况下,您并不是实际上将元素分组。这更像是将它们解组。

执行此操作的一种方法是使用xsl:key查找每个详细信息行的“标题”行。

<xsl:key name="fruity" 
    match="tr[not(td[@class='header'])]" 
    use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/> 

即对于每个详细信息行,获取最前面的标题行。

接下来,你就可以匹配所有的标题行,像这样:

<xsl:apply-templates select="tr/td[@class='header']"/> 

在匹配的模板,然后你可以提取类型,国家和排名。然后获得相关的详细信息行,它是在看父行的关键一个简单的例子:

<xsl:apply-templates select="key('fruity', generate-id(..))"> 

这里是整个XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" indent="yes"/> 

    <xsl:key name="fruity" 
     match="tr[not(td[@class='header'])]" 
     use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/> 

    <xsl:template match="/tbody"> 
     <data> 
     <!-- Match header rows --> 
     <xsl:apply-templates select="tr/td[@class='header']"/> 
     </data> 
    </xsl:template> 

    <xsl:template match="td"> 
     <!-- Match associated detail rows --> 
     <xsl:apply-templates select="key('fruity', generate-id(..))"> 
     <!-- Extract relevant parameters from the td cell --> 
     <xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/> 
     <xsl:with-param name="country" select="img[2]/@alt"/> 
     <xsl:with-param name="rank" select="normalize-space(text())"/> 
     </xsl:apply-templates> 
    </xsl:template> 

    <xsl:template match="tr"> 
     <xsl:param name="type"/> 
     <xsl:param name="country"/> 
     <xsl:param name="rank"/> 
     <entry> 
     <type> 
      <xsl:value-of select="$type"/> 
     </type> 
     <country> 
      <xsl:value-of select="$country"/> 
     </country> 
     <rank> 
      <xsl:value-of select="$rank"/> 
     </rank> 
     <color> 
      <xsl:value-of select="td[1]"/> 
     </color> 
     <shape> 
      <xsl:value-of select="td[2]"/> 
     </shape> 
     <taste> 
      <xsl:value-of select="td[3]"/> 
     </taste> 
     </entry> 
    </xsl:template> 
</xsl:stylesheet> 

当适用于您的输入文档中,产生以下输出:

<data> 
    <entry> 
     <type>apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Green</color> 
     <shape>Round shaped</shape> 
     <taste>Tasty</taste> 
    </entry> 
    <entry> 
     <type>apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Red</color> 
     <shape>Round shaped</shape> 
     <taste>Bitter</taste> 
    </entry> 
    <entry> 
     <type>apples</type> 
     <country>Portugal</country> 
     <rank>First Grade</rank> 
     <color>Pink</color> 
     <shape>Round shaped</shape> 
     <taste>Tasty</taste> 
    </entry> 
    <entry> 
     <type>strawberries</type> 
     <country>USA</country> 
     <rank>Fifth Grade</rank> 
     <color>Red</color> 
     <shape>Heart shaped</shape> 
     <taste>Super tasty</taste> 
    </entry> 
    <entry> 
     <type>bananas</type> 
     <country>Congo</country> 
     <rank>Third Grade</rank> 
     <color>Yellow</color> 
     <shape>Smile shaped</shape> 
     <taste>Fairly tasty</taste> 
    </entry> 
    <entry> 
     <type>bananas</type> 
     <country>Congo</country> 
     <rank>Third Grade</rank> 
     <color>Brown</color> 
     <shape>Smile shaped</shape> 
     <taste>Too sweet</taste> 
    </entry> 
</data> 
+0

+1对于一个很好的答案。 –