如何从R访问Wikipedia？

R是否有任何软件包允许查询维基百科（很可能使用Mediawiki API）以获取与此类查询相关的可用文章列表，以及为文本挖掘导入所选文章？如何从R访问Wikipedia？

2011-05-23 mjaniec

您可能会发现以下有用的内容：http://www.ragtag.info/2011/feb/10/processing-every-wikipedia-article/ – James 2011-05-23 10:57:27

使用RCurl软件包进行retreiving info，使用XML或RJSONIO软件包来分析响应。

如果您位于代理之后，请设置您的选项。

opts <- list(
    proxy = "136.233.91.120", 
    proxyusername = "mydomain\\myusername", 
    proxypassword = 'whatever', 
    proxyport = 8080 
)

使用getForm函数来访问the API。

search_example <- getForm(
    "http://en.wikipedia.org/w/api.php", 
    action = "opensearch", 
    search = "Te", 
    format = "json", 
    .opts = opts 
)

解析结果。

fromJSON(rawToChar(search_example))

来源

2011-05-23 13:39:24

我在使用此功能时遇到了一些搜索字词的问题，但我怀疑它是与我在网络上的问题。我需要志愿者用'search'参数中的不同字符串来检查示例代码。 – 2011-05-23 13:43:19

有WikipediR，

library(devtools) 
install_github("Ironholds/WikipediR") 
library(WikipediR)

'在R A链接到MediaWiki API包装' 它包括以下功能：

ls("package:WikipediR") 
[1] "wiki_catpages"  "wiki_con"   "wiki_diff"   "wiki_page"   
[5] "wiki_pagecats"  "wiki_recentchanges" "wiki_revision"  "wiki_timestamp"  
[9] "wiki_usercontribs" "wiki_userinfo"

这是在使用中，获得的贡献细节和用户一堆用户的详细信息：

library(RCurl) 
library(XML) 

# scrape page to get usernames of users with highest numbers of edits 
top_editors_page <- "http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits" 
top_editors_table <- readHTMLTable(top_editors_page) 
very_top_editors <- as.character(top_editors_table[[3]][1:5,]$User) 

# setup connection to wikimedia project 
con <- wiki_con("en", project = c("wikipedia")) 

# connect to API and get last 50 edits per user 
user_data <- lapply(very_top_editors, function(i) wiki_usercontribs(con, i)) 
# and get information about the users (registration date, gender, editcount, etc) 
user_info <- lapply(very_top_editors, function(i) wiki_userinfo(con, i))

来源

2014-06-04 02:04:45 Ben

如何从R访问Wikipedia？

回答

相关问题