2017-02-19 257 views
0

我正在处理非结构化XML文档,以便它可以转换为结构化文档。在非结构化文档看起来像下面XML文档中的节点选择

<?xml version="1.0" encoding="UTF-8"?> 
<CustomerInformation> 
    <CustomerPurchaseID>String</CustomerPurchaseID> 
    <MemberAddress>String</MemberAddress> 
    <MemberID>String</MemberID> 
    <MemberCity>String</MemberCity> 
    <MemberName>String</MemberName> 
    <MemberType>String</MemberType> 
    <MemberState>String</MemberState> 
    <MemberSince>String</MemberSince> 
    <PurchaseDate>String</PurchaseDate> 
    <CreditCardName></CreditCardName> 
    <CreditCardExpirration></CreditCardExpirration> 
    <Orders> 
     <LineItemCode>String</LineItemCode> 
     <LineItemID>String</LineItemID> 
     <LineItemDescription>String</LineItemDescription> 
     <DiscountCode>String</DiscountCode> 
    </Orders> 
    <Orders> 
     <LineItemCode>String</LineItemCode> 
     <LineItemID>String</LineItemID> 
     <LineItemDescription>String</LineItemDescription> 
     <DiscountCode>String</DiscountCode> 
    </Orders> 
    <ShipToAddress>String</ShipToAddress> 
    <ShipToCity>String</ShipToCity> 
    <ShipToFirstName>String</ShipToFirstName> 
    <ShipToLastName>String</ShipToLastName> 
    <ShipToState>String</ShipToState> 
    <ShipToZIPCode>String</ShipToZIPCode> 
    <CustomerAddressLine1>String</CustomerAddressLine1> 
    <CustomerAddressLine2>String</CustomerAddressLine2> 
    <CustomerID>String</CustomerID> 
    <CustomerCity>String</CustomerCity> 
    <CustomerEmail>String</CustomerEmail> 
    <CustomerFirstName>String</CustomerFirstName> 
    <CustomerLastName>String</CustomerLastName> 
    <CustomerHomePhone>String</CustomerHomePhone> 
    <CustomerState>String</CustomerState> 
    <CustomerZIP>String</CustomerZIP> 
    <Status>String</Status> 
    <OrderedFromName>String</OrderedFromName> 
    <CustomerIdentification></CustomerIdentification> 
    <PrimaryCustomerIndicator>String</PrimaryCustomerIndicator> 
    <OrderedFromAddressLine1Text>String</OrderedFromAddressLine1Text> 
    <OrderedFromAddressLine2Text>String</OrderedFromAddressLine2Text> 
    <OrderedFromCityName>String</OrderedFromCityName> 
    <OrderedFromStateCode>String</OrderedFromStateCode> 
    <OrderedFromZip5Code>String</OrderedFromZip5Code> 
    <OrderedFromZip4Code>String</OrderedFromZip4Code> 
    </CustomerInformation> 

应该转换成一些这样的:

<?xml version="1.0" encoding="UTF-8"?> 
<xmlns:evt="http://www.metadata..com/Management/"> 
    <Identifier>3442=000-MNNN</Identifier> 
    <TypeCode>Purchase History</TypeCode> 
    <TypeDescription>Order Summary</TypeDescription> 
    <PurposeCode>Invoice</PurposeCode> 
    <Member> 
     <Email>String</Email> 
     <MemberSince>03/23/2000</MemberSince> 
     <MemberType> 
      <MemberShipTypeCode>String</MemberShipTypeCode> 
      <TypeDescription>String</TypeDescription> 
     </MemberType> 
     <Address> 
      <AddressLine1Text>String</AddressLine1Text> 
      <AddressLine2Text>String</AddressLine2Text> 
      <CityName>String</CityName> 
      <StateCode>String</StateCode> 
      <Zip5Code>String</Zip5Code> 
      <Zip4Code>String</Zip4Code> 
     </Address> 
     <Telephone> 
      <AreaCode>String</AreaCode> 
      <TelephoneNumber>String</TelephoneNumber> 
     </Telephone> 
    </Member> 
    <Company> 
     <CompanyName>String</CompanyName> 
     <CustomerIdentification>0.0</CustomerIdentification> 
     <PrimaryCustomerIndicator>String</PrimaryCustomerIndicator> 
     <CompanyAddress> 
      <CompanyAddressLine1Text>String</CompanyAddressLine1Text> 
      <CompanyAddressLine2Text>String</CompanyAddressLine2Text> 
      <CompanyCityName>String</CompanyCityName> 
      <CompanyStateCode>String</CompanyStateCode> 
      <CompanyZip5Code>String</CompanyZip5Code> 
      <CompanyZip4Code>String</CompanyZip4Code> 
     </CompanyAddress> 
    </Company> 
    <Orders> 
    <CreditCard> 
      <CardName>String</CardName> 
      <CardExpirationDate>1967-08-13</CardExpirationDate> 
    </CreditCard> 
    <Order> 
     <Discount>String</Discount> 
     <ShippingVendorName>String</ShippingVendorName> 
     <ShipmentTrackingNumber>String</ShipmentTrackingNumber> 
     <ShipmentTrackingLinkText>String</ShipmentTrackingLinkText> 
     <CustomerName>String</CustomerName> 
     <CustomerEmailAddressText>String</CustomerEmailAddressText> 
     <Telephone> 
      <AreaCode>String</AreaCode> 
      <TelephoneNumber>String</TelephoneNumber> 
     </Telephone> 
     <ShippingAddress> 
      <ShippingAddressLine1Text>String</ShippingAddressLine1Text> 
      <ShippingAddressLine2Text>String</ShippingAddressLine2Text> 
      <ShippingCareOfText>String</ShippingCareOfText> 
      <ShippingCityName>String</ShippingCityName> 
      <ShippingStateCode>String</ShippingStateCode> 
      <ShippingZip5Code>String</ShippingZip5Code> 
      <ShippingZip4Code>String</ShippingZip4Code> 
     </ShippingAddress> 
     <LineItem> 
      <LineItemNumber>String</LineItemNumber> 
      <LineItemQuantityCount>0</LineItemQuantityCount> 
      <ItemOrderedIndicator>String</ItemOrderedIndicator> 
      <Discount>String</Discount> 
     </LineItem> 
    </Order> 
    </Orders> 

我能够通过创建结构化的格式,并通过简单地使用提取相关领域生成XML具有以下XSLT的节点值:

<xsl:value-of select=.../> 

但是我觉得可能有更好的方法来做到这一点。我希望能够在导航非结构化文档或平面文档时控制结构的生成方式。有没有办法为所有MemberAddress字段分组元素?如果我能够做到这一点,我可以创建输出的成员部分。我也可以为其他元素做同样的事情。我对结构化文档进行硬编码的担忧是,它可能在未来发生变化。如果可能,我宁愿能够控制输出。源文档中的所有成员信息应映射到目标文档中的成员元素。以OrderedFrom开头的源文档中的元素应映射到目标文档中的公司字段。 ShipTo元素依次映射到目标文档的订单部分中的发货信息等等。请帮忙!!

+0

''不是有效的开始标记。而''不是有效的XSLT指令。 –

回答

1

我对硬编码结构化文档的担忧是它可能会在将来更改 。

XSLT样式表将数据从一个XML模式转换为另一个XML模式。期望在任一模式中进行更改都不需要重写样式表是不现实的。

是否有一种方法可以将所有MemberAddress字段的元素分组为 示例?

是的,如果你有一些方法来识别它们。例如,你可以这样做:

<Member> 
    <xsl:for-each select="*[starts-with(name(), 'Member')]"> 
     <xsl:element name="{substring-after(name(), 'Member')}"> 
      <xsl:value-of select="." /> 
     </xsl:element> 
    </xsl:for-each> 
</Member> 

获得:

<Member> 
    <Address>String</Address> 
    <ID>String</ID> 
    <City>String</City> 
    <Name>String</Name> 
    <Type>String</Type> 
    <State>String</State> 
    <Since>String</Since> 
</Member> 

,但不适合你的预期输出。顺便说一句,您的输出会显示大量不在您输入内的数据,例如会员的电子邮件。

+0

是的,文件被修剪,因为它非常冗长,非常感谢 – BreenDeen