阅读网页中的特定内容？

我想制作一个应用程序（在C＃中），其中我必须从wiktionary.com或dictionary.com等网站获取一些含义。但是我从来没有使用过Xml，或者根本没有使用过网页。阅读网页中的特定内容？

我设法得到网页的响应（例如从一个特定的词dictionary.com）（我希望是xml格式）。

这是我得到了

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN"> 
<!--attributes for answers reference--> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/"> 
<head> 
<title> 
Hello | Define Hello at Dictionary.com 
</title> 
<meta name="description" content="Hello definition, (used to express a greeting, answer a telephone, or attract attention.) See more."/> 
<meta name="keywords" content="hello, online dictionary, English dictionary, hello definition, define hello, definition of hello, hello pronunciation, hello meaning, hello origin, hello examples"/> 
<link rel="canonical" href="http://dictionary.reference.com/browse/hello"/> 
<meta property="og:title" content="the definition of hello"/> 
<meta property="og:site_name" content="Dictionary.com"/> 
<meta property="og:image" content="http://sp2.dictionary.com/en/i/dictionary/facebook/dictionary_logo.png"/>

现在我想解析以下字符串进行响应的话“你好”。：。

used to express a greeting, answer a telephone, or attract attention.

我试过使用XmlReader但卡住了。有人可以帮我阅读这些内容吗？

来源

2011-04-11 Ankit

小心屏幕抓取，如果这就是你在做什么的另一种选择。很多时候它违反了网站的条款和条件，你的实现也将与他们的html格式紧密结合。如果他们改变他们的网站，很多时候你的代码将不再工作。 – BrandonZeider 2011-04-11 13:22:29

您可以使用HTML Agility Pack轻松解析HTML。

HtmlDocument doc = new HtmlDocument(); 
// replace with your own content 
doc.Load("file.htm"); 
foreach(HtmlNode meta in doc.DocumentElement.SelectNodes("/meta[@name='description'"]) 
{ 
    HtmlAttribute att = meta["content"]; 
    Consol.WriteLine(att.Value); 
}

来源

2011-04-11 13:18:12 mathieu

他的回应是XHTML（请参阅标题），所以XML解析器将正常工作。 – 2011-04-11 13:20:36

根据doctype，它是html。但你说得对：Xml解析器可以很好地工作，根据html完美的形成，适当的标签关闭，并没有时髦的字符（例如） – mathieu 2011-04-11 13:24:04

谢谢你的答复。你还可以告诉如何在这里使用XML解析器？因为我正在尝试学习使用XML。 – Ankit 2011-04-11 14:19:23

您可以使用Web服务，如http://services.aonaware.com/，这对你更更好的广告定位的网站:-)。

http://words.bighugelabs.com/api.php是其中有一个更简单的API

来源

2011-04-11 13:20:38

阅读网页中的特定内容？

回答

相关问题