用不同的页面分析HTML中的iText块循环

我目前有一个基于数据库中几行数据创建PDF的工作版本。对于数据库中的每一行，它都会在PDF中创建一个新页面。这一切都很好。现在我需要解析每行中的一些字段，以便正确呈现HTML。我可以see an example here它显示解析整个文档，虽然它是一个完整的字符串和解析文档。用不同的页面分析HTML中的iText块循环

我需要的是创建个别格式化的网页，只有特定的HTML字段被解析。是否有可能做到这一点？

下面是一些示例代码，我有一个创建新页面：

PdfFont fTimes = PdfFontFactory.CreateFont(FontConstants.TIMES_ROMAN); 
PdfFont fTimesBold = PdfFontFactory.CreateFont(FontConstants.TIMES_BOLD);      

// create the first page here 
doc.Add(new Paragraph("Abstract Submissions for " + eventName).SetFont(fTimes).SetFontSize(18).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Section Name: " + GetSectionName(ddlSections.SelectedValue)).SetFont(fTimes).SetFontSize(14).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Created: " + DateTime.Now.ToString("dddd, MMMM d, yyyy h:mm tt")).SetFont(fTimes).SetFontSize(11).SetFontColor(Color.BLACK)); 

// iterate through each of the items 
foreach (DataRow row in dsItems.Tables[0].Rows) 
{ 
    // create a new page for each abstract submission 
    doc.Add(new AreaBreak(iText.Layout.Properties.AreaBreakType.NEXT_PAGE)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationType"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationTitle"], "")).SetFont(fTimes).SetFontSize(16).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Authors"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Abstract"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
} 

doc.Close();

我应该注意到我使用的是MemoryStream与FileStream所以客户端可以下载立即在文件系统中保存，不需要。

**编辑 - 添加样本数据**

<table> 
    <tr> 
     <td>Poster</td> 
     <td>Abstract 1</td> 
     <td><strong><em>Doctor Name 1</em></strong> <strong>Doctor Name 2</strong></td> 
     <td><p>Some really long text <strong>which can have</strong> some different basic HTML <u>formatting in it</u></p></td> 
    </tr> 
    <tr> 
     <td>Presentation</td> 
     <td>Abstract 2</td> 
     <td><strong>Doctor Name 15 </strong><em>Doctor 3</em></td> 
     <td><p>Some really long text which can have some different basic HTML <em>formatting in it</em></p></td> 
    </tr> 
</table>

来源

2017-04-19 Brenden Kehren

你能分享一个要解析/渲染的内容样本吗？这个内容是一些小的子集，像一个富文本编辑器一样的格式，或者它是任何通用的HTML/CSS的东西？ – COeDev

我添加了一些示例数据@COeDev很抱歉格式不佳。基本上，标签中的所有内容都是数据库列。只有这样，我才能在没有编辑器格式化的情况下获得标记的全部效果。 –

如果除了“strong”，“p”，“em”以及html是有效的xml以外没有其他东西，那么可以轻松解析这些东西并从中创建itext元素。 – COeDev

有了这样的模式，你可以创建自己的XML/HTML到iText的翻译。你只需要实现你需要的标签：

internal interface ICustomElement { IEnumerable<IElement> GetContent(); } 

internal class CustomElementFactory { 
    public ICustomElement GetElement(XmlNode node) { 
    switch (node.Name) { 
     case "p": return new CustomParagraph (node, this); 
     // implement the tags you need using the ICustomElement interface 
     default: // e.g. treat unknown nodes as text 
    } 
} 

public class PdfCreator { 
    public byte[] GetPdf(XmlDocument template) { 
    PdfDocument doc ... 
    CustomElementFactory factory ... 
    foreach(XmlNode node in template.ChildNodes) { 
     doc.AddElements(factory.GetElement(node).GetContent()); 
     // the point why all this is possible in such an easy generic way is that almost every itext element implements the IElement interface and therefore can be added to the document this way. And containers like PdfPCell are taking IElements as well. 
     // Good job itext guys! ;) 
    } 

    return doc.CloseDocument(); 
    } 
} 

// here comes the magic: 

internal class CustomParagraph : ICustomElement { 
    // ctor storing the xmlnode and factory in private field 
    public IEnumerable<IElement> GetContent() { 
    Paragraph p = new Paragraph(); 
    p.Add(node.InnerText); // create a underline or bold or whatever font here when you are implementing the special html tags 

    // if the node has child elements, get their content by calling the factory.GetElement(child).GetContent() for each child. Then loop over the the IElement.Chunks collection of each IElement to add the containing chunks to the paragraph of this scope. This way you will be able to process nested html tags recursively. 
    // find a way to pass the style information of this scope to the factory when processing child nodes, so you will be able to render <strong>bold<u>underlindANDBOLD</u></strong> stuff correctly 

    return new List<IElement> { p }; 
    } 
}

这需要一些工作和微调，但它可以做到。

来源

2017-04-21 05:36:48 COeDev

用不同的页面分析HTML中的iText块循环

回答

相关问题