2012-03-09 70 views
1

美好的一天!XSLT2不区分大小写的正则表达式

这是我第一次在这里发布一个问题,所以请耐心等待。

我有在如何使匹配不敏感的情况下,一个问题,我想补充的标志,但不知道这是正确的地方插旗子,因为我仍然代码似乎不区分大小写。

预先感谢您分享您的想法,我可以如何解决这个问题。

--elmer

我做编辑这个帖子添加更多的代码我做了和任何一个源XML文件的样品要进行测试。索引条目列表上

基地我需要搜索从源XML每一个项目进行搜索,并且必须是不区分大小写。例如,如果搜索单词“缩写”,所有单词“缩写词或缩写词”必须在缩写或缩写词旁边有锚元素。

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/" 
xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all"> 

<xsl:output method="xml" encoding="UTF-8" indent="no"/> 

<xsl:template match="@*|node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
</xsl:template> 

<!-- create variable for EALL_Index--> 
<xsl:variable name="index" as="element() +"> 
    <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item"> 
     <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/> 
     <list> 
      <entry> 
       <xsl:variable name="itemString"> 
        <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))"> 
         <xsl:matching-substring> 
          <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/> 
          <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/> 
          <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/> 
          <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/> 
         </xsl:matching-substring> 
         <xsl:non-matching-substring> 
          <xsl:value-of select="."/> 
         </xsl:non-matching-substring> 
        </xsl:analyze-string> 
       </xsl:variable> 
       <xsl:value-of select="normalize-space($itemString)"/> 
      </entry> 
      <xsl:for-each select="link"> 
       <link target="{@targets}" id="{@n}"> 
       </link> 
      </xsl:for-each> 
     </list> 
    </xsl:for-each> 
</xsl:variable> 

<xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]"> 
    <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/> 
    <xsl:variable name="srchString" as="xs:string" select="."/> 
    <xsl:choose> 
     <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'))"> 
      <xsl:message select="concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/> 
      <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}"> 
       <xsl:matching-substring> 
        <xsl:value-of select="regex-group(1)"/> 
        <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'))]/link[@target=$id]"> 
         <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'))"> 
          <anchor target="{@target}" id="{@id}"/> 
         </xsl:if> 
        </xsl:for-each> 
        <xsl:value-of select="regex-group(2)"/> 
        <xsl:value-of select="regex-group(3)"/> 
       </xsl:matching-substring> 
      <xsl:non-matching-substring> 
       <xsl:value-of select="."/> 
      </xsl:non-matching-substring> 
     </xsl:analyze-string> 
    </xsl:when> 
    <xsl:otherwise> 
     <xsl:sequence select="."/> 
    </xsl:otherwise> 
    </xsl:choose> 
</xsl:template> 

索引条目源XML:搜索

<div2 id="EALL-index"> 
<head>EALL3 Index</head> 
<art><simplearticle id="SIM-INDEX" entry="" volume="" page=""> 
<pseudoarticle> 
<articleentry><mainentry>Index of Terms</mainentry></articleentry></pseudoarticle><list> 
<item>Abbreviation <link type="eall" targets="SIM-0002" n="idx-abbreviation-01">Abbreviations</link>, <link type="eall" targets="SIM-0021" n="idx-abbreviation-02">Compounds</link>, <link type="eall" targets="COM-vol3-0192" n="idx-abbreviation-03">Lexicography: Monolingual Dictionaries</link>, <link type="eall" targets="COM-vol3-0276" n="idx-abbreviation-04">Punctuation</link></item> 
</list></simplearticle></art></div2> 

源XML:

<?xml version="1.0" encoding="UTF-8"?> 
<art> 
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" page="1:1a" sortcode="abbreviations"> 
<pseudoarticle> 
<articleentry> 
<mainentry>Abbreviations</mainentry> 
</articleentry> 
<p>Generally speaking, there are four main categories of abbreviations encountered in Arabic texts:</p> 
<p> 
<list> 
<label>i.</label> 
<item>i. Suspensions: abbreviation by truncation of the letters at the end of the word, e.g.&#160;&#1575;&#1604;&#1605;&#1589;&#1600; = <hi rend="italic">al-mu&#7779;annif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals &#1778; and &#1635;, but which may represent the unpointed <hi rend="italic">t&#257;&#702;</hi> and <hi rend="italic">&#353;&#299;n</hi> (for <hi rend="italic">tam&#257;m</hi> and <hi rend="italic">&#353;ar&#7717;</hi>) when used in conjunction with marginal glosses.</item> 
<label>ii.</label> 
<item>ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. &#1602;&#1607; (<hi rend="italic">qawlu-hu</hi>).</item> 
<label>iii.</label> 
<item>iii. Sigla: using one letter to represent the whole word, e.g. &#1605; (<hi rend="italic">matn</hi>).</item> 
<label>iv.</label> 
<item>iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> &#8216;year&#8217;. Another example is the &#8216;two teeth stroke&#8217; (which looks like two unpointed <hi rend="italic">b&#257;</hi>'s), which represents the word &#1602;&#1601; &#8216;stop&#8217;, or the suspension &#1601;&#1578;&#1600; (for <hi rend="italic">fa-ta&#702;ammal-hu/h&#257;</hi> &#8216;reflect on it&#8217;), used in manuscripts for notabilia or side-heads.</item> 
</list> 
</p> 
<p>Closely connected with these abbreviations is a contraction of a group of words into one &#8216;portmanteau&#8217; word (<hi rend="italic">na&#7717;t</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm All&#257;h</hi>) <hi rend="italic">&#7717;amdala</hi> (<hi rend="italic">al-&#7717;amdu li-ll&#257;h</hi>) and <hi rend="italic">&#7779;alwala</hi> (<hi rend="italic">&#7779;all&#257; ll&#257;h &#703;alayhi</hi>). To all intents and purposes, the word <hi rend="italic">na&#7717;t</hi> corresponds to an acronym, i.e. a word formed from the abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, &#7717;amdala</hi>, and <hi rend="italic">&#7779;alwala</hi>, we encounter <hi rend="italic">&#7789;albaqa</hi> (<hi rend="italic">&#7789;&#257;la ll&#257;h baq&#257;&#702;a-hu</hi>), <hi rend="italic">&#7717;awqala</hi> or <hi rend="italic">&#7717;awlaqa</hi> (<hi rend="italic">l&#257; &#7717;awla wa-l&#257; quwwata &#702;ill&#257; bi-ll&#257;h</hi>), <hi rend="italic">&#7779;al&#703;ama</hi> (a synonym of <hi rend="italic">&#7779;alwala</hi>), <hi rend="italic">&#7717;asbala</hi> (<hi rend="italic">&#7717;asbun&#257; all&#257;h</hi>), <hi rend="italic">ma&#353;&#702;ala</hi> (<hi rend="italic">m&#257; &#353;&#257;&#702;a ll&#257;h</hi>), <hi rend="italic">sab&#7717;ala</hi> (<hi rend="italic">sub&#7717;&#257;na ll&#257;h</hi>), and <hi rend="italic">&#7717;ay&#703;ala</hi> (<hi rend="italic">&#7717;ayya &#703;al&#257; &#7779;-&#7779;al&#257;t</hi>) (as-Samarr&#257;'&#299; 1987; Gacek 2001).</p> 
<p>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p> 
<p>The use of abbreviations was quite popular among Muslim scholars, although originally some of the abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">ta&#7779;liya, &#7779;alwala</hi>), were disapproved of. In the manuscript age, abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Ma&#7717;f&#363;&#7827; 1964).</p> 
<p>Medieval scholars could not always agree on the meaning of some of the abbreviations used in manuscripts. The letter &#1581;, for instance, which is used to separate one <hi rend="italic">&#702;isn&#257;d</hi> from another, was thought by some to stand for <hi rend="italic">&#7717;&#257;&#702;il</hi> or <hi rend="italic">&#7717;ayl&#363;la</hi> &#8216;separation&#8217; and by others for <hi rend="italic">&#7717;ad&#299;&#7791;</hi> and even <hi rend="italic">&#7779;a&#7717;&#7717;a</hi>. Some scholars even thought that the letter <hi rend="italic">&#7717;&#257;&#702;</hi> should be pointed (&#1582; &#8211; <hi rend="italic">x&#257;&#702; mu&#703;jama</hi>) to stand for <hi rend="italic">&#702;isn&#257;d &#702;&#257;xar</hi> &#8216;another <hi rend="italic">&#702;isn&#257;d</hi>&#8217;. The contemporary scholar may face a similar dilemma (see e.g. Ali&#269; 1976).</p> 
<p>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter &#1591; may in fact be a &#1592;, and what appears to be an <hi rend="italic">&#703;ayn</hi> or <hi rend="italic">ayn</hi>, in its initial (&#1593;&#1600;) or isolated form (&#1593;), may actually be an unpointed <hi rend="italic">n&#363;n</hi> or <hi rend="italic">x&#257;&#702;</hi> (for <hi rend="italic">nusxa &#702;uxr&#257;</hi> &#8216;another copy&#8217;). Similarly, the same word or abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">&#7717;&#257;&#353;iya</hi> and <hi rend="italic">f&#257;&#702;ida</hi> can stand for a gloss or a side-head (&#8216;nota bene&#8217;), while the &#1589; or &#1589;&#1600; can be an abbreviation of <hi rend="italic">&#7779;a&#7717;&#7717;a</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">&#702;a&#7779;l</hi> (the body of the text), or it can stand for <hi rend="italic">&#7693;abba</hi> &#8216;door-bolt&#8217;, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or &#8216;sic&#8217;. Also, the abbreviation &#1606; may stand for <hi rend="italic">bay&#257;n</hi> &#8216;explanation&#8217; or <hi rend="italic">nusxa &#702;uxr&#257;</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p> 
<p>The earliest use of abbreviations in the Arabic language is probably connected with its orthography and possibly the &#8216;mysterious letters&#8217; (<hi rend="italic">al-&#7717;ur&#363;f al-muqa&#7789;&#7789;a&#703;a</hi>) at the beginning of some chapters of the <hi rend="italic">Qur&#702;&#257;n</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">j&#299;m</hi> (&#1580;&#1600;) or <hi rend="italic">m&#299;m</hi> (&#1605;&#1600;) was regarded by some scholars as an abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">&#353;&#299;n</hi> (&#1587;&#1600;) was used for <hi rend="italic">ta&#353;d&#299;d</hi> (or <hi rend="italic">&#353;adda</hi>), and the initial form of <hi rend="italic">&#7779;&#257;d</hi> (&#1589;&#1600;) was thought to represent <hi rend="italic">wa&#7779;la</hi> (or <hi rend="italic">&#7779;ila</hi>) (Wright 1967:13&#8211;14, 19; Gacek 2001:23).</p> 
<p>Most of the abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qur&#702;&#257;n</hi>, compilation and criticism of <hi rend="italic">&#7716;ad&#299;&#7791;</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of abbreviations (see, e.g., &#703;Alaw&#257;n 1972). They are especially common in works on <hi rend="italic">&#7716;ad&#299;&#7791;</hi> and jurisprudence (both Sunni and Shi&#703;i) (al-M&#257;maq&#257;n&#299; 1992; a&#7827;-Zufayr&#299; 2002), and although some abbreviations were standardized, most were specific to a given work. Among the commonly used abbreviations for major <hi rend="italic">&#7716;ad&#299;&#7789;</hi> compilations are: &#1582; (al-Bux&#257;r&#299;), &#1605; (Muslim or M&#257;lik), &#1583; (&#702;Ab&#363; D&#257;&#702;&#363;d), &#1578; (at-Tirmi&#7695;&#299;), &#1603; (M&#257;lik), &#1607; (&#702;Ab&#363; &#7694;arr or Ibn M&#257;ja), &#1606; or &#1587; (an-Nas&#257;&#702;&#299;), and the like (Gacek 1989:56).</p> 
</pseudoarticle> 
</simplearticle> 
</art> 

输出示例:

<art> 
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" sortcode="abbreviations" page="1:1a"> 
    <pseudoarticle> 
     <articleentry> 
      <mainentry><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations</mainentry> 
     </articleentry> 
     <p>Generally speaking, there are four main categories of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations encountered in Arabic texts:</p> 
     <p> 
      <list type="simple" TEIform="list"> 
       <label TEIform="label">i.</label> 
       <item TEIform="item">i. Suspensions: <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation by truncation of the letters at the end of the word, e.g. المصـ = <hi rend="italic">al-muṣannif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals ۲ and ٣, but which may represent the unpointed <hi rend="italic">tāʾ</hi> and <hi rend="italic">šīn</hi> (for <hi rend="italic">tamām</hi> and <hi rend="italic">šarḥ</hi>) when used in conjunction with marginal glosses.</item> 
       <label TEIform="label">ii.</label> 
       <item TEIform="item">ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. قه (<hi rend="italic">qawlu-hu</hi>).</item> 
       <label TEIform="label">iii.</label> 
       <item TEIform="item">iii. Sigla: using one letter to represent the whole word, e.g. م (<hi rend="italic">matn</hi>).</item> 
       <label TEIform="label">iv.</label> 
       <item TEIform="item">iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> ‘year’. Another example is the ‘two teeth stroke’ (which looks like two unpointed <hi rend="italic">bā</hi>'s), which represents the word قف ‘stop’, or the suspension فتـ (for <hi rend="italic">fa-taʾammal-hu/hā</hi> ‘reflect on it’), used in manuscripts for notabilia or side-heads.</item> 
      </list> 
     </p> 
     <p>Closely connected with these <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations is a contraction of a group of words into one ‘portmanteau’ word (<hi rend="italic">naḥt</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm Allāh</hi>) <hi rend="italic">ḥamdala</hi> (<hi rend="italic">al-ḥamdu li-llāh</hi>) and <hi rend="italic">ṣalwala</hi> (<hi rend="italic">ṣallā llāh ʿalayhi</hi>). To all intents and purposes, the word <hi rend="italic">naḥt</hi> corresponds to an <anchor target="SIM-0002" id="idx-acronym-01"/>acronym, i.e. a word formed from the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, ḥamdala</hi>, and <hi rend="italic">ṣalwala</hi>, we encounter <hi rend="italic">ṭalbaqa</hi> (<hi rend="italic">ṭāla llāh baqāʾa-hu</hi>), <hi rend="italic">ḥawqala</hi> or <hi rend="italic">ḥawlaqa</hi> (<hi rend="italic">lā ḥawla wa-lā quwwata ʾillā bi-llāh</hi>), <hi rend="italic">ṣalʿama</hi> (a synonym of <hi rend="italic">ṣalwala</hi>), <hi rend="italic">ḥasbala</hi> (<hi rend="italic">ḥasbunā allāh</hi>), <hi rend="italic">mašʾala</hi> (<hi rend="italic">mā šāʾa llāh</hi>), <hi rend="italic">sabḥala</hi> (<hi rend="italic">subḥāna llāh</hi>), and <hi rend="italic">ḥayʿala</hi> (<hi rend="italic">ḥayya ʿalā ṣ-ṣalāt</hi>) (as-Samarrā'ī 1987; Gacek 2001).</p> 
     <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p> 
     <p>The use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations was quite popular among Muslim scholars, although originally some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">taṣliya, ṣalwala</hi>), were disapproved of. In the manuscript age, <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Maḥfūẓ 1964).</p> 
     <p>Medieval scholars could not always agree on the meaning of some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations used in manuscripts. The letter ح, for instance, which is used to separate one <hi rend="italic">ʾisnād</hi> from another, was thought by some to stand for <hi rend="italic">ḥāʾil</hi> or <hi rend="italic">ḥaylūla</hi> ‘separation’ and by others for <hi rend="italic">ḥadīṯ</hi> and even <hi rend="italic">ṣaḥḥa</hi>. Some scholars even thought that the letter <hi rend="italic">ḥāʾ</hi> should be pointed (خ – <hi rend="italic">xāʾ muʿjama</hi>) to stand for <hi rend="italic">ʾisnād ʾāxar</hi> ‘another <hi rend="italic">ʾisnād</hi>’. The contemporary scholar may face a similar dilemma (see e.g. Alič 1976).</p> 
     <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter ط may in fact be a ظ, and what appears to be an <hi rend="italic">ʿayn</hi> or <hi rend="italic">ayn</hi>, in its initial (عـ) or isolated form (ع), may actually be an unpointed <hi rend="italic">nūn</hi> or <hi rend="italic">xāʾ</hi> (for <hi rend="italic">nusxa ʾuxrā</hi> ‘another copy’). Similarly, the same word or <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">ḥāšiya</hi> and <hi rend="italic">fāʾida</hi> can stand for a gloss or a side-head (‘nota bene’), while the ص or صـ can be an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">ṣaḥḥa</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">ʾaṣl</hi> (the body of the text), or it can stand for <hi rend="italic">ḍabba</hi> ‘door-bolt’, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or ‘sic’. Also, the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation ن may stand for <hi rend="italic">bayān</hi> ‘explanation’ or <hi rend="italic">nusxa ʾuxrā</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p> 
     <p>The earliest use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations in the Arabic language is probably connected with its orthography and possibly the ‘mysterious letters’ (<hi rend="italic">al-ḥurūf al-muqaṭṭaʿa</hi>) at the beginning of some chapters of the <hi rend="italic">Qurʾān</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">jīm</hi> (جـ) or <hi rend="italic">mīm</hi> (مـ) was regarded by some scholars as an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">šīn</hi> (سـ) was used for <hi rend="italic">tašdīd</hi> (or <hi rend="italic">šadda</hi>), and the initial form of <hi rend="italic">ṣād</hi> (صـ) was thought to represent <hi rend="italic">waṣla</hi> (or <hi rend="italic">ṣila</hi>) (Wright 1967:13–14, 19; Gacek 2001:23).</p> 
     <p>Most of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qurʾān</hi>, compilation and criticism of <hi rend="italic">Ḥadīṯ</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations (see, e.g., ʿAlawān 1972). They are especially common in works on <hi rend="italic">Ḥadīṯ</hi> and jurisprudence (both Sunni and Shiʿi) (al-Māmaqānī 1992; aẓ-Zufayrī 2002), and although some <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were standardized, most were specific to a given work. Among the commonly used <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations for major <hi rend="italic">Ḥadīṭ</hi> compilations are: خ (al-Buxārī), م (Muslim or Mālik), د (ʾAbū Dāʾūd), ت (at-Tirmiḏī), ك (Mālik), ه (ʾAbū Ḏarr or Ibn Māja), ن or س (an-Nasāʾī), and the like (Gacek 1989:56).</p> 
     <p>Specific to <hi rend="italic">Ḥadīṯ</hi> literature are other <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations connected with the frequent repetitions of such expressions as <hi rend="italic">ḥaddaṯ anā, ʾaxbaranā</hi>, and <hi rend="italic">ʾanbaʾanā</hi>, which were commonly abbreviated as: دثنا ,ثنا ,نا (<hi rend="italic">ḥaddaṯ anā</hi>); ابنا ,ارنا ,انا (<hi rend="italic">ʾaxbaranā</hi>); ق ثنا ,قثنا (<hi rend="italic">qāla ḥaddaṯ anā</hi>). The transition from one <hi rend="italic">ʾisnād</hi> to another, as mentioned above, was marked with ح (<hi rend="italic">ḥāʾil, taḥwīl, ḥaylula, ḥadīṯ</hi> or <hi rend="italic">ṣaḥḥa</hi>) (Gacek 1989:56), and for the evaluation of <hi rend="italic">ḥadīṯ</hi>s the following <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were used:ض (<hi rend="italic">ḍaʿīf</hi>), صح (<hi rend="italic">ṣaḥīḥ</hi>), ح (<hi rend="italic">ḥasan</hi>); م (<hi rend="italic">majhūl</hi>), مو (<hi rend="italic">muwāfiq</hi> or <hi rend="italic">mawqūf</hi>), قف (<hi rend="italic">mawqūf</hi>); ق (<hi rend="italic">muwaṭṭaq</hi> or <hi rend="italic">muttafaq ʿalayhi</hi>), ل (<hi rend="italic">mursal</hi>) (e.g. Gacek 1985:xiv, 96).</p> 
    </pseudoarticle> 
</simplearticle> 

+0

它不能伤害,以显示你试图实现(供应输入和预期产出)。我有这样的感觉,它可以用更简单直观的方式完成。 – Maestro13 2012-03-09 07:08:24

+0

根据您的要求,我用样本XML和样本输出对我的帖子进行了编辑。希望它现在让我清楚。再次感谢。 – rhemb 2012-03-09 08:08:51

+0

这里有更多的代码来说明问题,难道你不能更简洁地解释它吗?在诸如replace()和matches()之类的XPath函数中,有一个“flags”参数,您可以将其设置为“i”以进行与个案无关的匹配。你不能在正则表达式本身中包含标志。 – 2012-03-09 08:29:03

回答

0

我的其他答案是不使用替换函数 - 这会导致不完整的结果,因为“缩写”的出现不会被替换。 所以我尝试了另一种方法,包括利用迈克尔建议的“我”标志;不幸的是,这更复杂,因为它需要序列化到锚元素的字符串。见下文。
注意:此解决方案也有一个小缺陷:案例不会保留为像“缩写”这样的事件。 此外,略有改善的连载模板将导致< anchor .../>,而不是< anchor ...>< /anchor>

EDITED下面对其中两个问题都已经解决了一个版本。

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     exclude-result-prefixes="f xs"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates mode="main"/> 
    </xsl:template> 

    <xsl:template match="@*|node()" mode="main"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()" mode="main"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()" mode="main"> 
     <!--<xsl:variable name="replacement" select="concat(f:serialize($anchor), $searchString)"/>--> 
     <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/> 
     <xsl:variable name="replacement"> 
      <xsl:value-of select="$anchorAdd"/> 
      <xsl:value-of select="$searchString"/> 
     </xsl:variable> 
     <xsl:value-of select="replace(., $searchString, $replacement, 'i')" disable-output-escaping="yes"/> 
    </xsl:template> 

    <xsl:template match="*" mode="serialize"> 
     <xsl:text>&lt;</xsl:text> 
     <xsl:value-of select="name()"/> 
     <xsl:apply-templates mode="serialize" select="@*"/> 
     <xsl:text>&gt;</xsl:text> 
     <xsl:apply-templates mode="serialize"/> 
     <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
    </xsl:template> 

    <xsl:template match="text()" mode="serialize"> 
     <xsl:value-of select="."/> 
    </xsl:template> 

    <xsl:template match="@*" mode="serialize"> 
     <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>" 
    </xsl:template> 

    <xsl:function name="f:serialize"> 
     <xsl:param name="xml"/> 
     <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function> 
</xsl:stylesheet> 

EDITED

说明:
(1)利用反向引用$ 0为字符串被替换
(2)仍然不得不使用替换变量虽然这似乎有些奇怪;原因是使用$ anchorAdd时的错误消息直接
(3)序列:测试上(文本)节点的非空 - 在这种情况下/>而不是应用模板

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     exclude-result-prefixes="f"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="lower-case(//simplearticle/@sortcode)"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates mode="main"/> 
    </xsl:template> 

    <xsl:template match="@*|node()" mode="main"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()" mode="main"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()" mode="main"> 
     <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/> 
     <xsl:variable name="replacement"> 
      <xsl:value-of select="$anchorAdd"/> 
     </xsl:variable> 
     <xsl:value-of select="replace(., $searchString, concat($replacement, '$0'), 'i')" disable-output-escaping="yes"/> 
    </xsl:template> 

    <xsl:template match="*" mode="serialize"> 
     <xsl:text>&lt;</xsl:text> 
     <xsl:value-of select="name()"/> 
     <xsl:apply-templates mode="serialize" select="@*"/> 
     <xsl:choose> 
      <xsl:when test="not (string(.))"> 
       <xsl:text>/&gt;</xsl:text> 
      </xsl:when> 
      <xsl:otherwise> 
       <xsl:text>&gt;</xsl:text> 
       <xsl:apply-templates mode="serialize"/> 
       <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
      </xsl:otherwise> 
     </xsl:choose> 
    </xsl:template> 

    <xsl:template match="text()" mode="serialize"> 
     <xsl:value-of select="."/> 
    </xsl:template> 

    <xsl:template match="@*" mode="serialize"> 
     <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>"<xsl:text/> 
    </xsl:template> 

    <xsl:function name="f:serialize"> 
     <xsl:param name="xml"/> 
     <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function> 
</xsl:stylesheet> 
0

除了详细信息(比如不同的项目列表和选择从何处构建锚点以及在循环多个艺术元素时加上更改),下面应该让您再次参与(不需要正则表达式,只需要使用适当的功能)。我还没有取代文中缩写的出现。或者是代表你的错误?还应该在“缩写”旁边处理“缩写”吗?那么怎么做呢?比如拿searchString的“s”来处理那个单词呢?

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     exclude-result-prefixes="f"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates/> 
    </xsl:template> 

    <xsl:template match="@*|node()"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()"> 
     <xsl:call-template name="checkWord"> 
      <xsl:with-param name="splitText" select="tokenize(., ' ')"/> 
     </xsl:call-template> 
    </xsl:template> 

    <xsl:template name="checkWord"> 
     <xsl:param name="splitText"/> 
     <xsl:for-each select="$splitText"> 
      <xsl:if test="lower-case(.) = $searchString"> 
       <xsl:copy-of select="$anchor"/> 
      </xsl:if> 
      <xsl:value-of select="."/><xsl:text> </xsl:text> 
     </xsl:for-each> 
    </xsl:template> 

</xsl:stylesheet> 
+0

谢谢你帮助我解决这个问题...但是,如果我让你感到困惑... – rhemb 2012-03-09 09:15:33

+0

@ElmerBanate感谢我接受我的答案 - 但只有当你喜欢它时:-) – Maestro13 2012-03-09 09:31:44

0

最后,我能够解决我的问题,所以我在这里发布工作XSLT。

我第一次添加的 'i' 到我的比赛功能:

test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i'); 

的for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]

if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i') 

then add regex to match the lowercase word(s) by adding lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')) in to the analyze-string. 

analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}" 

使用此正则表达式我能得到较低的情况下,带有首字母大写的“缩写”和“缩写”。

(^|\W)(abbreviation[s]?|acronym[s]?|Abbreviation[s]?|acronym)($|\W) 

特别感谢迈克尔·凯Maestro13

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/" 
    xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all"> 

    <xsl:output method="xml" encoding="UTF-8" indent="no"/> 

    <xsl:template match="@*|node()"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:variable name="Entries"> 
     <xsl:for-each select="for $doc in collection(iri-to-uri(concat('original-eall-xml/', '?select=*.xml'))) return $doc/descendant::art/*"> 
      <list> 
       <entry> 
        <xsl:value-of select=".//mainentry"/> 
       </entry> 
       <id> 
        <xsl:value-of select="./@id"/> 
       </id> 
       <page> 
        <xsl:value-of select="./@page"/> 
       </page> 
      </list> 
     </xsl:for-each> 
    </xsl:variable> 

    <xsl:template match="art/*"> 
     <xsl:variable name="myId"> 
      <xsl:value-of select="tokenize(@id,' ')[1]"/> 
     </xsl:variable> 
     <xsl:variable name="myStrings" select="//text()"/> 
     <xsl:variable name="myEntry" select="descendant::mainentry/text()"/> 
     <xsl:copy> 
      <xsl:copy-of select="@* except @page"/> 
      <xsl:choose> 
       <xsl:when test="$Entries//entry[.=$myEntry] and $Entries//entry[.=$myEntry]/following-sibling::id[.=$myId]"> 
        <xsl:attribute name="page"><xsl:value-of select="$Entries//entry[.=$myEntry]/following-sibling::page"/></xsl:attribute> 
       </xsl:when> 
       <xsl:otherwise> 
        <xsl:attribute name="page">0</xsl:attribute> 
        <!--<xsl:message select="concat('mainentry: &quot;', $myEntry, '&quot; - volume: &quot;', @volume, '&quot; - id: &quot;',$myId, '&quot;')"/>--> 
       </xsl:otherwise> 
      </xsl:choose> 

      <xsl:apply-templates/> 
     </xsl:copy> 

    </xsl:template>  

    <!-- create variable for EALL_Index--> 
    <xsl:variable name="index" as="element() +"> 
     <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item"> 
      <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/> 
      <list> 
       <entry> 
        <xsl:variable name="itemString"> 
         <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))"> 
          <xsl:matching-substring> 
           <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/> 
           <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/> 
           <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/> 
           <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/> 
          </xsl:matching-substring> 
          <xsl:non-matching-substring> 
           <xsl:value-of select="."/> 
          </xsl:non-matching-substring> 
         </xsl:analyze-string> 
        </xsl:variable> 
        <xsl:value-of select="normalize-space($itemString)"/> 
       </entry> 
       <xsl:for-each select="link"> 
        <link target="{@targets}" id="{@n}"> 
        </link> 
       </xsl:for-each> 
      </list> 
     </xsl:for-each> 
    </xsl:variable> 

    <xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]"> 
     <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/> 
     <xsl:variable name="srchString" as="xs:string" select="."/> 
     <xsl:choose> 
      <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i')"> 
       <xsl:message select="concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/> 
       <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}"> 
        <xsl:matching-substring> 
         <xsl:value-of select="regex-group(1)"/> 
         <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]"> 
          <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')"> 
           <anchor target="{@target}" id="{@id}"/> 
          </xsl:if> 
         </xsl:for-each> 
         <xsl:value-of select="regex-group(2)"/> 
         <xsl:value-of select="regex-group(3)"/> 
        </xsl:matching-substring> 
       <xsl:non-matching-substring> 
        <xsl:value-of select="."/> 
       </xsl:non-matching-substring> 
      </xsl:analyze-string> 
     </xsl:when> 
     <xsl:otherwise> 
      <xsl:sequence select="."/> 
     </xsl:otherwise> 
     </xsl:choose> 
    </xsl:template> 
</xsl:stylesheet> 
相关问题