使用Schematron的快速修复标记在混合内容元素

我有一个XML文件看起来像这样的个别字（simplifed）：使用Schematron的快速修复标记在混合内容元素

<defs> 
    <def>Pure text</def> 
    <def>Mixed content, cuz there is also another: <element>element inside</element> and more.</def> 
    <def><element>Text nodes within elements other than def are ok.</element></def> 
<defs>

我想写与快速修复一个Shematron规则，这将使我以混合内容的形式将每个单词放在defs中，并将它们分别包装在<w>元素中，并在<pc>元素中包装标点符号。换句话说，应用快速修复后，我会得到

<defs> 
    <def>Pure text.</def> 
    <def><w>Mixed</w> <w>content</w><pc>,</pc> <w>cuz</w> <w>there</w> <w>is</w> <w>also</w> <w>another</w><pc>:</pc> <element>element inside</element> <w>and</w> <w>more</w><pc>.</pc></def> 
    <def><element>Text nodes within elements other than def are ok.</element></def> 
<defs>

<w> S和<pc>秒之间空间都OK。

现在，识别混合内容很简单 - 我想我得到这一权利。问题是我不知道如何标记Schematron中的字符串，然后对每个标记应用修复。这是我得到多远：

<sch:pattern id="mixed"> 
    <sch:rule context="def[child::text()][child::*]"> 
     <sch:report test="tokenize(child::text(), '\s+')" sqf:fix="mix_in_def"> 
      Element has mixed content 
      <!-- the above this gives me the error: a sequence of more than one item is not allowed as the first argument of tokenize--> 
     </sch:report> 
     <sqf:fix id="mix_in_def"> 
      <sqf:description> 
       <sqf:title>Wrap words in w</sqf:title> 
       <sqf:p>Fixes the mixed content in def by treating each non-tagged string as w.</sqf:p> 
      </sqf:description> 
      <sqf:replace match="." node-type="element" target="w"> 
       <!--how do i represent the content of the matched token?--> 
      </sqf:replace> 
      <!-- also do i create an altogether separate rule for punctuation?--> 
     </sqf:fix> 
    </sch:rule> 
</sch:pattern>

任何提示将不胜感激。

坦奇

来源

2015-07-28 Tench

我的回答对你有帮助吗？ – sergioFC

我仍在等待一些反馈，请告诉我我的回答是否有用。 – sergioFC

绝对。我真的很抱歉，我没有确认你的答案。我的错。 – Tench

您可以使用XSL，看看下面这个例子（它在代码中的注释说明）：

<sch:pattern id="mixed"> 
    <!-- Your context is now def => this makes easier add new def reports --> 
    <sch:rule context="def"> 

     <!-- So now you report every def that has text and elements --> 
     <sch:report test="child::text() and child::*" sqf:fix="mix_in_def"> 
      Element has mixed content 
      <!-- What you were doing before where causing error because you were passing a sequence of text nodes to tokenize (it expects a string) --> 
     </sch:report> 

     <sqf:fix id="mix_in_def"> 
      <sqf:description> 
       <sqf:title>Wrap words in w</sqf:title> 
       <sqf:p>Fixes the mixed content in def by treating each non-tagged string as w.</sqf:p> 
      </sqf:description> 

      <!-- Replace every mixed text node of this def (this is called for every matched node) --> 
      <sqf:replace match="child::text()"> 
        <!-- Tokenize this text node => for each token choose... --> 
        <xsl:for-each select="tokenize(., '\s+')"> 
         <!-- For this token choose --> 
         <xsl:choose> 
          <!-- If text is one of this (,.:) Please note that you are using \s+ to separate tokens. So a comma is only a token if it is separated by spaces --> 
          <xsl:when test=". = (',', '.', ':', 'is')"> <!-- "is" just to test results --> 
           <pc><xsl:value-of select="."/></pc> 
          </xsl:when> 
          <!-- Otherwise wrap it in <w> --> 
          <xsl:otherwise> 
           <w><xsl:value-of select="."/></w> 
          </xsl:otherwise> 
         </xsl:choose> 
        </xsl:for-each> 
      </sqf:replace> 

     </sqf:fix> 
    </sch:rule> 
</sch:pattern>

你得这个适应您的具体问题，但我认为这将帮你。

来源

2015-07-29 22:35:36 sergioFC

有趣的建议@sergioFC！您是否碰巧知道将XSLT与Schematron规则混合后可以找到更多信息？我能找到的最具体的指针在“查询语言XSLT 2结合的” [Schematron的ISO文件]的部分（http://standards.iso.org/ittf/PubliclyAvailableStandards/c055982_ISO_IEC_19757-3_2016.zip）。在这里，''，'的'和''提及作为允许的内容之前''元素，但到目前为止，我还没有找到XSLT指令_inside_模式什么。它正在工作，所以它必须被允许。 – rvdb

谢谢。恐怕我无法提供文档。正如我所看到的，Schematron规范说，你可以定义自己定制的'xsl：function'来后期重用它们。请参阅Schematron示例中的[xsl：function]（http://pastebin.com/Ue5mgRCy）。我在本文回答中的XSLT代码是Schematron QuickFix规范的一部分，它不是Schematron规范本身的一部分。 [Schematron QuickFix]（http://www.schematron-quickfix.com/quickFix/guide.html）是在Schematron几年后才创建的，目的是为了解决Schematron可以报告的问题的XML修复，但它是Schematron的扩展，不是其中的一部分 – sergioFC

好的，谢谢澄清SQF上下文。然而，我试着修剪下来（见[在此要点示例文件（https://gist.github.com/rvdb/ad2dc77bf5fc33b4e02cb3e415d19888）），它似乎_does_在Schematron的工作NS以及：各种XSLT指令只是在Schematron模式内愉快地执行。对我而言，这太棒了，但我想知道我是否对功能或错误（规范或实现）感到兴奋。我只是想向你指出这一点，但会尝试找到一个更合适的论坛来尝试实现这一点。 – rvdb

使用Schematron的快速修复标记在混合内容元素

回答

相关问题