用c＃阅读非英文html页面

我想在网站中找到一个希伯来语字符串。阅读代码已附上。用c＃阅读非英文html页面

然后我尝试使用streamReader读取文件，但无法匹配其他语言的字符串。我想要做什么？

// used on each read operation 
    byte[] buf = new byte[8192]; 

    // prepare the web page we will be asking for 
    HttpWebRequest request = (HttpWebRequest) 
     WebRequest.Create("http://www.webPage.co.il"); 

    // execute the request 
    HttpWebResponse response = (HttpWebResponse) 
     request.GetResponse(); 

    // we will read data via the response stream 
    Stream resStream = response.GetResponseStream(); 

    string tempString = null; 
    int count = 0; 
    FileStream fileDump = new FileStream(@"c:\dump.txt", FileMode.Create); 
    do 
    { 
     count = resStream.Read(buf, 0, buf.Length); 
     fileDump.Write(buf, 0, buf.Length); 

    } 
    while (count > 0); // any more data to read? 

    fileDump.Close();

来源

2010-06-09 AYBABTU

你缺少适当的编码器，看看WebResponse.GetResponseStream Method的细节

更新：使用希伯来语（Windows）中的编码是1255

Encoding encode = System.Text.Encoding.GetEncoding(1255); // Hebrew (Windows) 

// Pipe the stream to a higher level stream reader with the required encoding format. 
StreamReader readStream = new StreamReader(resStream , encode);

来源

2010-06-09 18:04:29 volody

仍然没有... 我觉得我的问题可能与搜索到的字符串有关，我的意思是我不能匹配： str.contains（“other language code”）; 对不对？我想要做什么？ – AYBABTU 2010-06-09 18:12:47

我试图编码搜索到的消息，但它也失败 string messageToFind =“otherLanguage”; UTF8Encoding utf8 = new UTF8Encoding（）; Byte [] encodedBytes = utf8.GetBytes（messageToFind）; messageToFind = encodedBytes.ToString（）; – AYBABTU 2010-06-09 18:15:51

解决它。

的问题就是选择了错误的编码，我选择UTF-8，这并不总是正确的答案:)

重点线路：

Encoding encode = System.Text.Encoding.GetEncoding("windows-1255"); 
StreamReader readStream = new StreamReader(ReceiveStream, encode);

来源

2010-06-09 19:28:03 AYBABTU

请编辑您最初的问题，并将其作为您的解决方案添加到具有相同问题的其他人。 – Marcote 2010-06-09 19:31:01

用c＃阅读非英文html页面

回答

相关问题