0
我必须得到一个学校项目的〜1000个网站的源代码。我在for循环中使用HTTP Webrequest。但是,我的列表中超过一半的网站返回404错误,因此无法找到网站。当我在Chrome,Firefox或Internet Explorer浏览本网站时,一切正常。[C#]获取网站的源代码(404错误)
我的继承人代码来获取源代码:
public string getSource(string url){
string urlAddress = url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
return data;
}
也许它不会因为1000个网站质量的作品?
也许你应该给我们一些成功的和一些失败的网址检出。 – Kell 2014-11-24 16:19:27