加载并保存包含波兰语字符的HTML文件

我需要加载HTML模板文件（使用std::ifstream），添加一些内容，然后将其保存为完整的网页。如果不是波兰字符这将是很简单 - 我已经试过的所有组合char/wchar_t，Unicode/Multi-Byte字符集，iso-8859-2/utf-8，ANSI/utf-8和他们没有工作对我来说（总是有一些不正确显示的字符（或者其中一些根本不显示）加载并保存包含波兰语字符的HTML文件

我可以在这里粘贴很多代码和文件，但我不确定这是否会有帮助，但也许你可以告诉我：什么格式/编码应该模板文件有，我应该在网页中声明什么编码，我应该如何加载并保存该文件以获得正确结果？

（如果我的问题不够具体，或者你做需要代码/文件的例子，让我知道。）

编辑：我已经试过库建议的评论：

std::string fix_utf8_string(std::string const & str) 
{ 
    std::string temp; 
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp)); 
    return str; 
}

致电：

fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");

抛出：utf8::not_enough_room - 我做错了什么？

来源

2013-04-30 NPS

看看[这个]（http://utfcpp.sourceforge.net/）库 – 2013-04-30 09:53:38

@ bash.d请查看编辑我的问题。 – NPS 2013-05-02 18:14:27

@ bash.d不幸的是，该库根本不适用于我（即使没有抛出异常，它仍然没有正确地转换字符）。 – NPS 2013-05-02 23:51:22

不知道这是（完美）的方式去，但下面的解决方案为我工作！

我救了我的HTML模板文件为ANSI（或至少这就是记事本++说的），改变了每一个写到文件流操作：

file << std::string("some text with polish chars: ąśżźćńłóę");

到：

其中：

std::string ToUtf8(std::string ansiText) 
{ 
    int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0); 
    wchar_t * wideText = new wchar_t[ansiRequiredSize + 1]; 
    wideText[ansiRequiredSize] = NULL; 
    MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize); 
    int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL); 
    char utf8Text[1024]; 
    utf8Text[utf8RequiredSize] = NULL; 
    WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL); 
    delete [] wideText; 
    return utf8Text; 
}

的基本思想是利用MultiByteToWideChar()和WideCharToMultiByte()功能便利着想rt字符串从ANSI（多字节）到宽字符，然后从宽字符到utf-8（更多在这里：http://www.chilkatsoft.com/p/p_348.asp）。最好的部分是 - 我不需要改变任何东西（即std::ofstream到std::wofstream或使用任何第三方库或改变我实际使用文件流的方式（而不是将字符串转换为必要的utf-8））！

也许应该为其他语言工作，虽然我没有测试。

来源

2013-05-02 23:49:59 NPS

加载并保存包含波兰语字符的HTML文件

回答

相关问题