gpt4 book ai didi

html - 将 HTML 表格转换为 R 数据框

转载 作者:行者123 更新时间:2023-12-03 18:23:33 24 4
gpt4 key购买 nike

<TABLE  cellspacing=1 cellpadding=7 rules=all frame=Box border=1>
<thead>
<TR>
<TD ROWSPAN=2 ALIGN=CENTER VALIGN=CENTER>&nbsp;</TD>
<TD COLSPAN=6 ALIGN=CENTER>1a. My peers make a positive impact my work environment.</TD>
<TD ALIGN=CENTER>Number</TD>
</TR>
<TR>
<TD ALIGN=CENTER>Strongly agree <br> </TD>
<TD ALIGN=CENTER>Generally agree <br> </TD>
<TD ALIGN=CENTER>Neither agree nor<br>disagree</TD>
<TD ALIGN=CENTER>Generally disagree<br> </TD>
<TD ALIGN=CENTER>Strongly disagree<br> </TD>
<TD ALIGN=CENTER>No basis to judge<br> </TD>
<TD ALIGN=CENTER>of Cases</TD>
</TR>
</thead>
<tbody>
<TR>
<TD ALIGN=LEFT VALIGN=TOP> Company-Wide </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 44.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 44.9</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 6.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2,014</TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP> Region 1 </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 45.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 45.2</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 5.7</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1.4</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1,699</TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP>Division 1 </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 52.9</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 39.7</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 4.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.5</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.8</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM>0</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 121</TD>
</TR>
</tbody>
</TABLE>
<hr><A NAME="IDX1">&nbsp;</A>

我有一个 HTML 文件,其中包含上述几种表格。我想将它们转换成一个数据框,其中当前在表标题中的每个调查问题都将出现在一列中。回答每个问题的百分比将保留在列中,回答级别也是如此。并非所有问题都有相同数量的回答(即有些是五分制,有些是九分制)。我尝试了 readHTMLTable ,然后对该结果执行 do.call rbind ,但无法获得感兴趣的数据框,因为列数不相同。我欢迎任何关于如何进行的建议。谢谢!

编辑:
library(xml)
library(dplyr)
questions<-readHTMLTable(files[8], trim=T, as.data.frame=T, header=T)
data<-bind_rows(questions)

结果在我想要的数据框中,但由于某些问题的响应级别高于其他问题,因此“案例数”数据不会始终出现在一列中。有没有办法在合并之前命名每个表的最后一列?

最佳答案

您可以为此使用 rvest 包。但是,可能需要注意带有空格的列名。我使用选项 fill=TRUE 作为快速修复,但也许这可以以更好的方式完成。

library(rvest)
my_df <- as.data.frame(read_html(text) %>% html_table(fill=TRUE))
> my_df
# X1 X2 X3 X4 X5 X6 X7 X8
#1 1a. My peers make a positive impact my work environment. <NA> <NA> <NA> <NA> <NA> Number
#2 Strongly agree Generally agree Neither agree nordisagree Generally disagree Strongly disagree No basis to judge of Cases <NA>
#3 Company-Wide 44.1 44.9 6.6 2.6 1.6 0.1 2,014
#4 Region 1 45.6 45.2 5.7 2.1 1.4 0.1 1,699
#5 Division 1 52.9 39.7 4.1 2.5 0.8 0 121

关于数据,我从 OP 中复制粘贴了 html 代码,并使用 text 将其分配给变量 text <- '<TABLE cellspacing=1 cellpadding=7 rules=all frame=...' ,使用单引号。

格式的一些细节可以在之后以一种相当简单的方式更正:
my_df[2,] <- c("",my_df[2,][-length(my_df)])
#> my_df
# X1 X2 X3 X4 X5 X6 X7 X8
#1 1a. My peers make a positive impact my work environment. <NA> <NA> <NA> <NA> <NA> Number
#2 Strongly agree Generally agree Neither agree nordisagree Generally disagree Strongly disagree No basis to judge of Cases
#3 Company-Wide 44.1 44.9 6.6 2.6 1.6 0.1 2,014
#4 Region 1 45.6 45.2 5.7 2.1 1.4 0.1 1,699
#5 Division 1 52.9 39.7 4.1 2.5 0.8 0 121

本质上,在这种情况下,第二行的条目应该向右移动一个单元格。

数据
text <- '<TABLE  cellspacing=1 cellpadding=7 rules=all frame=Box border=1>\n  <thead>\n  <TR>\n  <TD ROWSPAN=2 ALIGN=CENTER VALIGN=CENTER>&nbsp;</TD>\n    <TD COLSPAN=6 ALIGN=CENTER>1a. My peers make a positive impact my work environment.</TD>\n      <TD ALIGN=CENTER>Number</TD>\n        </TR>\n        <TR>\n        <TD ALIGN=CENTER>Strongly agree  <br>         </TD>\n          <TD ALIGN=CENTER>Generally agree <br>         </TD>\n            <TD ALIGN=CENTER>Neither agree nor<br>disagree</TD>\n              <TD ALIGN=CENTER>Generally disagree<br>       </TD>\n                <TD ALIGN=CENTER>Strongly disagree<br>        </TD>\n                  <TD ALIGN=CENTER>No basis to judge<br>        </TD>\n                    <TD ALIGN=CENTER>of Cases</TD>\n                      </TR>\n                      </thead>\n                      <tbody>\n                      <TR>\n                      <TD ALIGN=LEFT VALIGN=TOP>  Company-Wide                                     </TD>\n                        <TD ALIGN=RIGHT VALIGN=BOTTOM>        44.1</TD>\n                          <TD ALIGN=RIGHT VALIGN=BOTTOM>        44.9</TD>\n                            <TD ALIGN=RIGHT VALIGN=BOTTOM>         6.6</TD>\n                              <TD ALIGN=RIGHT VALIGN=BOTTOM>         2.6</TD>\n                                <TD ALIGN=RIGHT VALIGN=BOTTOM>         1.6</TD>\n                                  <TD ALIGN=RIGHT VALIGN=BOTTOM>         0.1</TD>\n                                    <TD ALIGN=RIGHT VALIGN=BOTTOM>   2,014</TD>\n                                      </TR>\n                                      <TR>\n                                      <TD ALIGN=LEFT VALIGN=TOP> Region 1                                 </TD>\n                                        <TD ALIGN=RIGHT VALIGN=BOTTOM>        45.6</TD>\n                                          <TD ALIGN=RIGHT VALIGN=BOTTOM>        45.2</TD>\n                                            <TD ALIGN=RIGHT VALIGN=BOTTOM>         5.7</TD>\n                                              <TD ALIGN=RIGHT VALIGN=BOTTOM>         2.1</TD>\n                                                <TD ALIGN=RIGHT VALIGN=BOTTOM>         1.4</TD>\n                                                  <TD ALIGN=RIGHT VALIGN=BOTTOM>         0.1</TD>\n                                                    <TD ALIGN=RIGHT VALIGN=BOTTOM>   1,699</TD>\n                                                      </TR>\n                                                      <TR>\n                                                      <TD ALIGN=LEFT VALIGN=TOP>Division 1            </TD>\n                                                        <TD ALIGN=RIGHT VALIGN=BOTTOM>        52.9</TD>\n                                                          <TD ALIGN=RIGHT VALIGN=BOTTOM>        39.7</TD>\n                                                            <TD ALIGN=RIGHT VALIGN=BOTTOM>         4.1</TD>\n                                                              <TD ALIGN=RIGHT VALIGN=BOTTOM>         2.5</TD>\n                                                                <TD ALIGN=RIGHT VALIGN=BOTTOM>         0.8</TD>\n                                                                  <TD ALIGN=RIGHT VALIGN=BOTTOM>0</TD>\n                                                                    <TD ALIGN=RIGHT VALIGN=BOTTOM>     121</TD>\n                                                                      </TR>\n                                                                      </tbody>\n                                                                      </TABLE>\n                                                                      <hr><A NAME=\"IDX1\">&nbsp;</A>'
#> class(text)
#[1] "character"

关于html - 将 HTML 表格转换为 R 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32400916/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com