如何将Word文档的页面拆分为c中的单独文件＃

我有一个将图像转换为word文档的OCR程序。单词文档包含所有图像的文本，我想将它分成单独的文件。如何将Word文档的页面拆分为c中的单独文件＃

有没有办法在c＃中做到这一点？

感谢

来源

2012-08-01 Iman

不容易在Word文档中结束，虽然Word创建以w文件：lastRenderedPageBreak。

最好让您的OCR程序在每个已转换文本块之间的文档中插入一些标记。

然后，根据它是什么类型的Word文档，使用适当的工具处理该文件。

来源

2012-08-01 22:18:48 JasonPlutext

如果您安装了Word，则可以使用Word对象模型从C＃处理Word文档。

首先，添加对Word对象模型的引用。右键点击该项目，然后Add Reference... -> COM -> Microsoft Word 14.0 Object Model（或类似的，取决于您的Word版本）。

然后，您可以使用下面的代码：

using Microsoft.Office.Interop.Word; 
//for older versions of Word use: 
//using Word; 

namespace WordSplitter { 
    class Program { 
     static void Main(string[] args) { 
      //Create a new instance of Word 
      var app = new Application(); 

      //Show the Word instance. 
      //If the code runs too slowly, you can show the application at the end of the program 
      //Make sure it works properly first; otherwise, you'll get an error in a hidden window 
      //(If it still runs too slowly, there are a few other ways to reduce screen updating) 
      app.Visible = true; 

      //We need a reference to the source document 
      //It should be possible to get a reference to an open Word document, but I haven't tried it 
      var doc = app.Documents.Open(@"path\to\file.doc"); 
      //(Can also use .docx) 

      int pageCount = doc.Range().Information[WdInformation.wdNumberOfPagesInDocument]; 

      //We'll hold the start position of each page here 
      int pageStart = 0; 

      for (int currentPageIndex = 1; currentPageIndex <= pageCount; currentPageIndex++) { 
       //This Range object will contain each page. 
       var page = doc.Range(pageStart); 

       //Generally, the end of the current page is 1 character before the start of the next. 
       //However, we need to handle the last page -- since there is no next page, the 
       //GoTo method will move to the *start* of the last page. 
       if (currentPageIndex < pageCount) { 
        //page.GoTo returns a new Range object, leaving the page object unaffected 
        page.End = page.GoTo(
         What: WdGoToItem.wdGoToPage, 
         Which: WdGoToDirection.wdGoToAbsolute, 
         Count: currentPageIndex + 1 
        ).Start - 1; 
       } else { 
        page.End = doc.Range().End; 
       } 
       pageStart = page.End + 1; 

       //Copy and paste the contents of the Range into a new document 
       page.Copy(); 
       var doc2 = app.Documents.Add(); 
       doc2.Range().Paste(); 
      } 
     } 
    } 
}

参考：Word Object Model Overview on MSDN

来源

2012-08-02 06:56:03

感谢亲爱@ZevSpitz – Iman 2012-08-03 08:11:23

这是一个完美的出发点，以创造一些有用的。 – 2012-10-16 15:12:45

同other answer，但有一个IEnumerator和扩展方法的文档。

static class PagesExtension { 
    public static IEnumerable<Range> Pages(this Document doc) { 
     int pageCount = doc.Range().Information[WdInformation.wdNumberOfPagesInDocument]; 
     int pageStart = 0; 
     for (int currentPageIndex = 1; currentPageIndex <= pageCount; currentPageIndex++) { 
      var page = doc.Range(
       pageStart 
      ); 
      if (currentPageIndex < pageCount) { 
       //page.GoTo returns a new Range object, leaving the page object unaffected 
       page.End = page.GoTo(
        What: WdGoToItem.wdGoToPage, 
        Which: WdGoToDirection.wdGoToAbsolute, 
        Count: currentPageIndex+1 
       ).Start-1; 
      } else { 
       page.End = doc.Range().End; 
      } 
      pageStart = page.End + 1; 
      yield return page; 
     } 
     yield break; 
    } 
}

主要的代码最终是这样的：

static void Main(string[] args) { 
    var app = new Application(); 
    app.Visible = true; 
    var doc = app.Documents.Open(@"path\to\source\document"); 
    foreach (var page in doc.Pages()) { 
     page.Copy(); 
     var doc2 = app.Documents.Add(); 
     doc2.Range().Paste(); 
    } 
}

来源

2012-08-02 07:01:59

如何将Word文档的页面拆分为c中的单独文件＃

回答

相关问题