有没有办法将文本（链接）添加到原始数据？

我正在爬取一些网站。有没有办法将文本（链接）添加到原始数据？

链接不正确。该页面未打开。

所以我想添加一个链接到原始数据

或者，也许有一个更好的办法，比我想的。

请让我知道，如果有一个很好的方式

-Ex-

[一个错误的地址]

/qna/detail.nhn?d1id=7 & DIRID = 70111 &的docId = 280474152

[你想要的文字添加]

我想将一个地址添加到我的代码前端（＃公告网址）

的Http：//〜naver.com

library(httr) 
library(rvest) 
library(stringr) 


# Bulletin URL 
list.url = 'http://kin.naver.com/qna/list.nhn?m=expertAnswer&dirId=70111' 

# Vector to store title and body 
titles = c() 
contents = c() 

# 1 to 10 page bulletin crawling 
for(i in 1:10){ 
    url = modify_url(list.url, query=list(page=i)) # Change the page in the bulletin URL 
    h.list = read_html(url, encoding = 'utf-8') # Get a list of posts, read and save html files from url 

    # Post link extraction 
    title.link1 = html_nodes(h.list, '.title') #class of title 
    title.links = html_nodes(title.link1, 'a') #title.link1 to a로 

    article.links = html_attr(title.links, 'href') 
    #Extract attrribute 

    for(link in article.links){ 
    h = read_html(link) # Get the post 

    # title 
    title = html_text(html_nodes(h, '.end_question._end_wrap_box h3')) 

    title = str_trim(repair_encoding(title)) 

    titles = c(titles, title) 

    # content 
    content = html_nodes(h, '.end_question .end_content._endContents') 

    ## Mobile question content 
    no.content = html_text(html_nodes(content, '.end_ext2')) 

    content = repair_encoding(html_text(content)) 

    ## Mobile question content 
    ## ex) http://kin.naver.com/qna/detail.nhn?d1id=8&dirId=8&docId=235904020&qb=7Jes65Oc66aE&enc=utf8&section=kin&rank=19&search_sort=0&spq=1 
    if (length(no.content) > 0) 
    { 
     content = str_replace(content, repair_encoding(no.content), '') 
    } 

    content <- str_trim(content) 

    contents = c(contents, content) 

    print(link) 

    } 
} 

# save 
result = data.frame(titles, contents)

来源

2017-07-14 koko

将url转换为字符串将其与字符串（Http：//〜naver.com）连接起来。保存输出并将其添加到您想要的列表中。在你运行字符串之前，这样做会更聪明。如果不添加它，你可以检查url是否有“http ...”。我会写代码并发表一个答案，但我不知道R ... – hansTheFranz

我知道R，但我不明白你想做什么。你能提供一个预期的输出或其他东西来帮助我理解吗？ –

@F.Privé[email protected]我可以在这里发送邮件吗？（详细的解释，可执行文件...）如果您允许，我会在两个小时内发送邮件。 – koko

如果在for循环之前添加article.links <- paste0("http://kin.naver.com", article.links)，这似乎工作（运行）。

来源

2017-07-15 17:53:25

谢谢Thanka你!!!!!!! – koko

有没有办法将文本（链接）添加到原始数据？

回答

相关问题