从字符串中删除Unicode字符

-2

下面是我的代码段，我已经能够删除一些转义字符。但问题是我无法从ParseLine（）读取的给定字符串NewOutput中删除unicode字符。另外我想要统计包含unicode的行数。从字符串中删除Unicode字符

例如字符串NewOutput有3条线为：

@ KayKay121拖着我到图书馆。现在我必须提高工作效率\ udc3d \ udc94 https://t.co/HjZR3d5QaQ（时间戳：Thu Oct 29 17:51:50 +0000 2015）

6A决定推迟最后的投票，直到执行委员会听取上诉为止。似乎设定了：7个地区。（时间戳：Thu Oct 29 17:51:51 +0000 2015）

@i_am_sknapp谢谢你关注我们，Seth。（时间戳：Thu Oct 29 18:10:49 +0000 2015）

这对我很有帮助:)谢谢！

if (readtweetfile.is_open()) 
{ 
    while (!readtweetfile.eof()) 
    { 
     getline(readtweetfile,output); 
     ParseLine(output,NewOutput); 
     std::string unicod_string = output; 

     if(NewOutput!=" ") 
     { 
      std::string firstChar="Check"; 
      std::string secondChar; 
      std::string checkingChar=""; 
      for (std::string::iterator it = NewOutput.begin(), end = NewOutput.end(); it != end; ++it) 
      { 
       if(firstChar=="Check") 
        firstChar = *it; 
       else 
       { 
        secondChar = *it; 
        checkingChar = firstChar + secondChar; 

        if(checkingChar=="\\\"") 
        { 
         writetweetfile << secondChar ; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\/") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\'") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\n") 
        { 
         writetweetfile << " " ; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\t") 
        { 
         writetweetfile << " "; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\ ") 
        { 
         writetweetfile << " "; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\\") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\u") 
        { 
         writetweetfile << "unicode"; 
         firstChar="Check"; 
         continue; 
        } 

        writetweetfile << firstChar; 
        firstChar=secondChar; 
       } 
      } 
     } 
     writetweetfile << std::endl; 
    } 
}

来源

2015-11-08 shahganesh

你从哪里得到这些字符串？该文件是以某种文件格式保存的吗？例如。如果文件是JSON，只需使用JSON解析器，它将解码这些转义。其次，'\ ud83d \ udc94'是一个单个字符的代理对（可能是表情符号）。 – roeland

那么实际上不知道你想什么，输出就为您3个样品 - 我想出了这个

\\(u|U)[a-zA-Z0-9]{4}|\\|\t|\n

这将发现Unicode和转义字符

如果您需要有些不同，用更多的例子来修改这个问题，更重要的是，你想要完成的输出是什么。

来源

2015-11-09 21:19:09 Nefariis

从字符串中删除Unicode字符

回答

相关问题