2015-02-07 60 views
0

请告诉我从http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html获取歌词有什么问题。我希望只有歌词会被提取。 预先感谢您网页报废 - WP8 - HTMLAgilityPack

protected async override void OnNavigatedTo(NavigationEventArgs e) 
    { 
     base.OnNavigatedTo(e); 
     string htmlPage = ""; 
     using (var client = new HttpClient()) 
     { 
      htmlPage = await client.GetStringAsync("http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html/"); 
     } 

     HtmlDocument htmlDocument = new HtmlDocument(); 
     htmlDocument.LoadHtml(htmlPage); 

     List<Lyrics> lyrics = new List<Lyrics>(); 

     foreach (var div in htmlDocument.DocumentNode.SelectNodes("//div[@style='margin-left:10px;margin-right:10px']")) 
     { 
      Lyrics newMovie = new Lyrics(); 
      newMovie.Summary= div.SelectSingleNode("br\\").InnerText.Trim(); 
      //newMovie.Summary = div.SelectSingleNode(".//div[@id='lyrics']").InnerText.Trim(); 
      //newMovie.Title = div.SelectSingleNode(".//div[@class='title']").InnerText.Trim(); 
      lyrics.Add(newMovie); 
     } 

     lstMovies.ItemsSource = lyrics; 
    } 
} 

}

回答

0

您的查询是错误的。

//div[@style='margin-left:10px;margin-right:10px'] 

应该

//div[@id='main']/div[3] 

我写了一篇文章关于报废,如果你想:Get content from a webpage or “How to Scrape the Sky”


顺便提一下,azlyrics.com由musicxmatch提供技术支持。也许你应该检查他们的API而不是报废? 安全饮用水从源头开始。

+0

您的解决方案不起作用 – 2015-02-09 16:44:40

+0

您的错误是什么? – aloisdg 2015-02-11 13:37:12