PHP和antiword不正确解析西里尔文本

我想解析在我的Linux服务器上使用antiword的MS Office 2003文档。但它不会正确解析西里尔文本。PHP和antiword不正确解析西里尔文本

它返回是这样的：

??? ???? ???????????

有谁知道的方法来正确地解析微软Office 2003文档包含古斯拉夫语？

来源

2011-12-02 vladimir

这是一个编码问题，你用什么代码解析文本？ –

我试图在我的php代码中使用命令行'antiword test.doc'执行它我使用同样的方法'shell_exec（'antiword test.doc'）' – vladimir

@vladimir当它从命令行执行时它能正常工作吗？ – DaveRandom

Antiword有一个编码参数，也许你给一个尝试：

shell_exec('antiword -X UTF-8 test.doc')

或者使用koi8-r，然后在PHP通过iconv()

或者转换尝试的LibreOffice在命令行模式

shell_exec('soffice --headless --convert-to txt test.doc')

来源

2011-12-02 10:41:38 mario

antiword hasn't option'X' – vladimir

我有一个2001年的超级老版本'0.32'，它有。 – mario

[手册页]（http://linux.die.net/man/1/antiword）意味着antiword已经在使用utf8。 – hafichuk

我解决了这个问题使用西里尔文字

良好的文档，您会看到here

工作的代码如下：

$content = shell_exec('/usr/bin/antiword -m cp1251.txt '.$filename); 
var_dump($content);

注重PARAM -m（字符映射文件）

你忘了设置正确的映射文件

作品文件概念rns映射文件：

Q9: Which mapping file (-m option) is correct in my situation? 
A9: The correct mapping file depends on the character set you need for output 
    in a specific language. 
    For Western European languages (like English, French, German) this is 
    8859-1.txt. (OS/2: cp1252.txt) (DOS: cp850.txt) 
    For Eastern European languages (like Polish, Czech, Slovak, Croatian) this 
    is 8859-2.txt. (OS/2: cp1250.txt) (DOS: cp852.txt) 
    For Esperanto use 8859-3.txt. 
    For Russian use 8859-5.txt or koi8-r.txt. (OS/2: cp1251.txt) 
    (DOS: cp866.txt) 
    For Ukrainian use koi8-u.txt. 
    For Arabic use 8859-6.txt. (DOS: cp864.txt) 
    For Hebrew use 8859-8.txt. (DOS: cp862.txt) 
    For Thai use 8859-11.txt. 
    If your system supports it, you might also try UTF-8.txt. 

    NOTE: UTF-8 also enables Antiword to show text in languages like Chinese, 
      Japanese and Korean.

来源

2012-08-21 11:37:13 Pascal

PHP和antiword不正确解析西里尔文本

回答

相关问题