gpt4 book ai didi

r - 有没有办法在原始数据中添加文本(链接)?

转载 作者:行者123 更新时间:2023-11-30 09:19:32 26 4
gpt4 key购买 nike

我正在抓取一些网站。

链接不正确。页面打不开。

所以我想添加原始数据的链接

或者也许有比我想象的更好的方法。

如果有好的方法请告诉我

-例如-

[[地址错误]]

/qna/detail.nhn?d1id=7&dirId=70111&docId=280474152

[[您要添加的文字]]

我想在代码前面添加一个地址(#公告网址)

http://~naver.com

library(httr)
library(rvest)
library(stringr)


# Bulletin URL
list.url = 'http://kin.naver.com/qna/list.nhn?m=expertAnswer&dirId=70111'

# Vector to store title and body
titles = c()
contents = c()

# 1 to 10 page bulletin crawling
for(i in 1:10){
url = modify_url(list.url, query=list(page=i)) # Change the page in the bulletin URL
h.list = read_html(url, encoding = 'utf-8') # Get a list of posts, read and save html files from url

# Post link extraction
title.link1 = html_nodes(h.list, '.title') #class of title
title.links = html_nodes(title.link1, 'a') #title.link1 to a로

article.links = html_attr(title.links, 'href')
#Extract attrribute

for(link in article.links){
h = read_html(link) # Get the post

# title
title = html_text(html_nodes(h, '.end_question._end_wrap_box h3'))

title = str_trim(repair_encoding(title))

titles = c(titles, title)

# content
content = html_nodes(h, '.end_question .end_content._endContents')

## Mobile question content
no.content = html_text(html_nodes(content, '.end_ext2'))

content = repair_encoding(html_text(content))

## Mobile question content
## ex) http://kin.naver.com/qna/detail.nhn?d1id=8&dirId=8&docId=235904020&qb=7Jes65Oc66aE&enc=utf8&section=kin&rank=19&search_sort=0&spq=1
if (length(no.content) > 0)
{
content = str_replace(content, repair_encoding(no.content), '')
}

content <- str_trim(content)

contents = c(contents, content)

print(link)

}
}

# save
result = data.frame(titles, contents)

最佳答案

如果添加 article.links <- paste0("http://kin.naver.com", article.links)在 forloop 之前,这似乎有效(正在运行)。

关于r - 有没有办法在原始数据中添加文本(链接)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45110262/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com