2010-06-09 44 views
0

我想在网站中找到一个希伯来语字符串。阅读代码已附上。用c#阅读非英文html页面

然后我尝试使用streamReader读取文件,但无法匹配其他语言的字符串。 我想要做什么?

// used on each read operation 
    byte[] buf = new byte[8192]; 

    // prepare the web page we will be asking for 
    HttpWebRequest request = (HttpWebRequest) 
     WebRequest.Create("http://www.webPage.co.il"); 

    // execute the request 
    HttpWebResponse response = (HttpWebResponse) 
     request.GetResponse(); 

    // we will read data via the response stream 
    Stream resStream = response.GetResponseStream(); 

    string tempString = null; 
    int count = 0; 
    FileStream fileDump = new FileStream(@"c:\dump.txt", FileMode.Create); 
    do 
    { 
     count = resStream.Read(buf, 0, buf.Length); 
     fileDump.Write(buf, 0, buf.Length); 

    } 
    while (count > 0); // any more data to read? 

    fileDump.Close(); 

回答

0

你缺少适当的编码器,看看WebResponse.GetResponseStream Method的细节

更新:使用希伯来语(Windows)中的编码是1255

Encoding encode = System.Text.Encoding.GetEncoding(1255); // Hebrew (Windows) 

// Pipe the stream to a higher level stream reader with the required encoding format. 
StreamReader readStream = new StreamReader(resStream , encode); 
+0

仍然没有... 我觉得我的问题可能与搜索到的字符串有关,我的意思是我不能匹配: str.contains(“other language code”); 对不对? 我想要做什么? – AYBABTU 2010-06-09 18:12:47

+0

我试图编码搜索到的消息,但它也失败 string messageToFind =“otherLanguage”; UTF8Encoding utf8 = new UTF8Encoding(); Byte [] encodedBytes = utf8.GetBytes(messageToFind); messageToFind = encodedBytes.ToString(); – AYBABTU 2010-06-09 18:15:51

0

解决它。

的问题就是选择了错误的编码,我选择UTF-8,这并不总是正确的答案:)

重点线路:

Encoding encode = System.Text.Encoding.GetEncoding("windows-1255"); 
StreamReader readStream = new StreamReader(ReceiveStream, encode); 
+0

请编辑您最初的问题,并将其作为您的解决方案添加到具有相同问题的其他人。 – Marcote 2010-06-09 19:31:01