gpt4 book ai didi

r - 异常处理 RSelenium switchToFrame() 错误 : ElementNotVisible

转载 作者:行者123 更新时间:2023-12-04 15:38:55 26 4
gpt4 key购买 nike

我正在尝试在 RSelenium 中实现异常处理并需要帮助。请注意,我已使用 robotstxt 检查了抓取此页面的权限。包裹。

library(RSelenium)
library(XML)
library(janitor)
library(lubridate)
library(magrittr)
library(dplyr)

remDr <- remoteDriver(
remoteServerAddr = "192.168.99.100",
port = 4445L
)
remDr$open()

# Open TightVNC to follow along as RSelenium drives the browser

# navigate to the main page
remDr$navigate("https://docs.google.com/spreadsheets/d/1o1PlLIQS8v-XSuEz1eqZB80kcJk9xg5lsbueB7mTg1U/pub?output=html&widget=true#gid=690408156")

# look for table element
tableElem <- remDr$findElement(using = "id", "pageswitcher-content")

# switch to table
remDr$switchToFrame(tableElem)

# parse html for first table
doc <- htmlParse(remDr$getPageSource()[[1]])
table_tmp <- readHTMLTable(doc)
table_tmp <- table_tmp[[1]][-2, -1]
table_tmp <- table_tmp[-1, ]
colnames(table_tmp) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
table_tmp$city <- rep("montreal", nrow(table_tmp))
table_tmp$date <- rep(Sys.Date() - 5, nrow(table_tmp))

# switch back to the main/outer frame
remDr$switchToFrame(NULL)

# I found the elements I want to manipulate with Inspector mode in a browser
webElems <- remDr$findElements(using = "css", ".switcherItem") # Month/Year tabs at the bottom
arrowElems <- remDr$findElements(using = "css", ".switcherArrows") # Arrows to scroll left and right at the bottom

# Create NULL object to be used in for loop
big_df <- NULL
for (i in seq(length(webElems))) {

# choose the i'th Month/Year tab
webElem <- webElems[[i]]
webElem$clickElement()

tableElem <- remDr$findElement(using = "id", "pageswitcher-content") # The inner table frame

# switch to table frame
remDr$switchToFrame(tableElem)
Sys.sleep(3)
# parse html with XML package
doc <- htmlParse(remDr$getPageSource()[[1]])
Sys.sleep(3)
# Extract data from HTML table in HTML document
table_tmp <- readHTMLTable(doc)
Sys.sleep(3)
# put this into a format you can use
table <- table_tmp[[1]][-2, -1]
table <- table[-1, ]
# rename the columns
colnames(table) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
# add city name to a column
table$city <- rep("Montreal", nrow(table))

# add the Month/Year this table was extracted from
today <- Sys.Date() %m-% months(i + 1)
table$date <- today

# concatenate each table together
big_df <- dplyr::bind_rows(big_df, table)

# Switch back to main frame
remDr$switchToFrame(NULL)

################################################
### I should use exception handling here ###
################################################


}

当浏览器到达 January 2018表找不到下一个 webElems元素和抛出和错误:

enter image description here

Selenium message:Element is not currently visible and so may not be interacted with Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: '617e51cbea11', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.79-boot2docker', java.version: '1.8.0_91' Driver info: driver.version: unknown

Error: Summary: ElementNotVisible Detail: An element command could not be completed because the element is not visible on the page. class: org.openqa.selenium.ElementNotVisibleException Further Details: run errorDetails method In addition: There were 50 or more warnings (use warnings() to see the first 50)



通过在 for 循环的末尾包含此代码,我一直在相当天真地处理它。这不是一个好主意,原因有两个:1)滚动速度很难弄清楚,并且会在其他(更长的)谷歌页面上失败,2)当它尝试点击右箭头时,for循环最终失败,但是它已经在最后 - 因此它不会下载最后几张表。
# click the right arrow to scroll right
arrowElem <- arrowElems[[1]]
# once you "click"" the element it is "held down" - no way to " unclick" to prevent it from scrolling too far
# I currently make sure it only scrolls a short distance - via Sys.sleep() before switching to outer frame
arrowElem$clickElement()
# give it "just enough time" to scroll right
Sys.sleep(0.3)
# switch back to outer frame to re-start the loop
remDr$switchToFrame(NULL)

我希望通过执行 arrowElem$clickElement() 来处理这个异常。当这个错误弹出。我认为通常会使用 tryCatch() ;不过,这也是我第一次学习异常处理。我想我可以将它包含在 remDr$switchToFrame(tableElem) 中for 循环的一部分,但它不起作用:
tryCatch({
suppressMessages({
remDr$switchToFrame(tableElem)
})
},
error = function(e) {
arrowElem <- arrowElems[[1]]
arrowElem$clickElement()
Sys.sleep(0.3)
remDr$switchToFrame(NULL)
}
)

最佳答案

我试了一下。处理异常时,我喜欢使用某种形式

check <- try(expression, silent = TRUE) # or suppressMessages(try(expression, silent = TRUE))
if (any(class(check) == "try-error")) {
# do stuff
}
我发现它使用起来很方便,而且通常可以正常工作,包括在使用 selenium 时。然而,这里遇到的问题是单击一次箭头总是会将我带到最后一个可见的工作表 - 跳过中间的所有内容。

替代解决方案
所以这里有一个替代方案可以解决 * 的任务刮 table * 不是 上述意义上的异常处理任务。
代码
# Alernative: -------------------------------------------------------------

remDr <- RSelenium::remoteDriver(
remoteServerAddr = "192.168.99.100",
port = 4445L
)
remDr$open(silent = TRUE)
# navigate to the main page
# needs no be done once before looping, else content is not available
remDr$navigate("https://docs.google.com/spreadsheets/d/1o1PlLIQS8v-XSuEz1eqZB80kcJk9xg5lsbueB7mTg1U/pub?output=html&widget=true#gid=690408156")


# I. Preliminaries:
#
# 1. build the links to all spreadsheets
# 2. define the function create_table
#
# 1.
# get page source
html <- remDr$getPageSource()[[1]]
# split it line by line
html <- unlist(strsplit(html, '\n'))
# restrict to script section
script <- grep('^\\s*var\\s+gidMatch', html, value = TRUE)
# split the script by semi-colon
script <- unlist(strsplit(script, ';'))
# retrieve information
sheet_months <- gsub('.*name:.{2}(.*?).{1},.*', '\\1',
grep('\\{name\\s*\\:', script, value = TRUE), perl = TRUE)
sheet_gid <- gsub('.*gid:.{2}(.*?).{1},.*', '\\1',
grep('\\gid\\s*\\:', script, value = TRUE), perl = TRUE)
sheet_url <- paste0('https://docs.google.com/spreadsheets/d/1o1PlLIQS8v-XSuEz1eqZB80kcJk9xg5lsbueB7mTg1U/pubhtml/sheet?headers%5Cx3dfalse&gid=',
sheet_gid)
#
# 2.
# table yielding function
# just for readability in the loop
create_table <- function (remDr) {
# parse html with XML package
doc <- XML::htmlParse(remDr$getPageSource()[[1]])
Sys.sleep(3)
# Extract data from HTML table in HTML document
table_tmp <- XML::readHTMLTable(doc)
Sys.sleep(3)
# put this into a format you can use
table <- table_tmp[[1]][-2, -1]
# add a check-up for size mismatch
table_fields <- as.character(t(table[1,]))
if (! any(grepl("size", tolower(table_fields)))) {
table <- table[-1, ]
# rename the columns
colnames(table) <- c("team_name", "start_time", "end_time", "total_time", "puzzels_solved")
table$team_size <- NA_integer_
table <- table[,c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")]
} else {
table <- table[-1, ]
# rename the columns
colnames(table) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
}
# add city name to a column
table$city <- rep("Montreal", nrow(table))

# add the Month/Year this table was extracted from
today <- Sys.Date()
lubridate::month(today) <- lubridate::month(today)+1
table$date <- today

# returns the table
table
}

# II. Scrapping the content
#
# 1. selenium to generate the pages
# 2. use create_table to extract the table
#
big_df <- NULL
for (k in seq_along(sheet_url)) {
# 1. navigate to the page
remDr$navigate(sheet_url[k])
# remDr$screenshot(display = TRUE) maybe one wants to see progress
table <- create_table(remDr)

# 2. concatenate each table together
big_df <- dplyr::bind_rows(big_df, table)

# inform progress
cat(paste0('\nGathered table for: \t', sheet_months[k]))
}

# close session
remDr$close()
结果
在这里你可以看到 headtailbig_df
head(big_df)
# team_name team_size start_time end_time total_time puzzels_solved city date
# 1 Tortoise Tortes 5 19:00 20:05 1:05 5 Montreal 2019-02-20
# 2 Mulholland Drives Over A Smelly Cat 4 7:25 8:48 1:23 5 Montreal 2019-02-20
# 3 B.R.O.O.K. 2 7:23 9:05 1:42 5 Montreal 2019-02-20
# 4 Motivate 4 18:53 20:37 1:44 5 Montreal 2019-02-20
# 5 Fighting Mongooses 3 6:31 8:20 1:49 5 Montreal 2019-02-20
# 6 B Lovers 3 6:40 8:30 1:50 5 Montreal 2019-02-20
tail(big_df)
# team_name team_size start_time end_time total_time puzzels_solved city date
# 545 Ale Mary <NA> 6:05 7:53 1:48 5 Montreal 2019-02-20
# 546 B.R.O.O.K. <NA> 18:45 20:37 1:52 5 Montreal 2019-02-20
# 547 Ridler Co. <NA> 6:30 8:45 2:15 5 Montreal 2019-02-20
# 548 B.R.O.O.K. <NA> 18:46 21:51 3:05 5 Montreal 2019-02-20
# 549 Rotating Puzzle Collective <NA> 18:45 21:51 3:06 5 Montreal 2019-02-20
# 550 Fire Team <NA> 19:00 22:11 3:11 5 Montreal 2019-02-20
简短说明
  • 为了执行这项任务,我首先生成文档中所有电子表格的链接。去做这个:
  • 导航到文档
  • 提取源代码
  • 使用 gid 提取工作表月份和 URL(通过 regex 数字)

  • 完成后,遍历 URL,收集并绑定(bind)表

  • 另外,为了便于阅读,我创建了一个名为 create_table 的小函数。同时以正确的格式返回表格。它主要是循环中包含的代码。我只为列数添加了一个安全措施(一些电子表格没有 team_size 字段 - 在这些情况下,我将其设置为 NA_integer )。

    关于r - 异常处理 RSelenium switchToFrame() 错误 : ElementNotVisible,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54084659/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com