python - 函数内 Beautiful Soup 中的“ResultSet”对象没有属性 'findAll' 错误消息-6ren

python - 函数内 Beautiful Soup 中的“ResultSet”对象没有属性 'findAll' 错误消息

转载作者：行者123 更新时间：2023-12-01 03:24:12

25

4

我正在学习 Pyhton，尤其是 beautiful soup，并且我正在使用一组 html 文件进行正则表达式的 Google 练习，其中包含不同年份的流行婴儿名字(例如baby1990.html 等)。如果您有兴趣，可以在这里找到此数据集:https://developers.google.com/edu/python/exercises/baby-names

每个 html 文件都包含一个包含婴儿姓名数据的表格，如下所示:

我编写了一个函数，从 html 文件中提取婴儿的名字并将它们存储到数据帧中，字典中的数据帧以及聚合在单个数据帧中的所有数据帧。

每个 html 文件中有两个表。包含婴儿数据的表具有以下 html 代码:

<table width="100%" border="0" cellspacing="0" cellpadding="4" summary="formatting">

在这一行中，独特的属性是summary =“formatting”。

我编写的函数是根据我收到的反馈进行编辑的，如下:

def babynames(path):

# This function takes the path of the directory where the html files are stored and returns a list containing the 
# a dataframe which encompasses all the tabular baby-names data in the files and as well as a dictionary holding
# a separate dataframe for each html file

# 0: Initialize objects
dicnames = {}  # will hold the dataframes containing the tabular data of each year
dfnames = pd.DataFrame([])  # will hold the aggregate data

# 1: Create a list containing the full paths of the baby files in the directory indicated by the path argument of the babynames
# function
allfiles = files(path)

# 2: Begin the looping through the files 

for file in allfiles: 
        with open(file,"r") as f: soup = bs(f.read(), 'lxml')  # Convert the file to a soup

        # 3. Initialize empty lists to hold the contents of the cells
        Rank=[]
        Baby_1 =[]
        Baby_2 =[] 
        df = pd.DataFrame([])

        # 4. Extract the Table containing the Baby data and loop through the rows of this table

        for row in soup.select("table[summary=formatting] tr"):

         # 5. Extract the cells 

            cells = row.findAll("td")

            # 6. Convert to text and append to lists
            try:
                Rank.append(cells[0].find(text=True))  
                Baby_1.append(cells[1].find(text=True))
                Baby_2.append(cells[2].find(text=True))
            except:
                print "file: " , file
                try:
                        print "cells[0]: " , cells[0]
                except:
                        print "cells[0] : NaN"
                try:
                        print "cells[1]: " , cells[1]
                except:
                        print "cells[1] : NaN"    
                try:
                        print "cells[2]: " , cells[2]
                except:
                        print "cells[2] : NaN"   

            # 7. Append the lists to the empty dataframe df
            df["Rank"] = Rank 
            df["Baby_1"] = Baby_1
            df["Baby_2"] = Baby_2

            # 8. Append the year to the dataframe as a separate column
            df["Year"] = extractyear(file)  # Call the function extractyear() defined in the environment with input
                                            # the full pathname stored in variable file and examined in the current
                                            # iteration

            # 9. Rearrange the order of columns
            # df.columns.tolist() = ['Year', 'Rank', 'Baby_1', 'Baby_2']

            #10. Store the dataframe to a dictionary as the value which key is the name of the file
            pattern = re.compile(r'.*(baby\d\d\d\d).*')
            filename = re.search(pattern, file).group(1)
            dicnames[filename] = df

    # 11. Combine the dataframes stored in the dictionary dicname to an aggregate dataframe dfnames
        for key, value in dicnames.iteritems():
             dfnames = pd.concat[dfnames, value] 

    # 12. Store the dfnames and dicname in a list called result.  Return result.
        result = [dfnames, dicnames]
        return result

当我使用给定路径(存储 html 文件的目录的路径)运行该函数时，我收到以下错误消息:

result = babynames(path)

输出:

---------------------------------------------------------------------------


file:  C:/Users/ALEX/MyFiles/JUPYTER NOTEBOOKS/google-python-exercises/babynames/baby1990.html
cells[0]:  cells[0] : NaN
cells[1]:  cells[1] : NaN
cells[2]:  cells[2] : NaN
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-72-5c9ebdc4dcdb> in <module>()
----> 1 result = babynames(path)

<ipython-input-71-a0263a6790da> in babynames(path)
     54 
     55                 # 7. Append the lists to the empty dataframe df
---> 56                 df["Rank"] = Rank
     57                 df["Baby_1"] = Baby_1
     58                 df["Baby_2"] = Baby_2

C:\users\alex\Anaconda2\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2355         else:
   2356             # set column
-> 2357             self._set_item(key, value)
   2358 
   2359     def _setitem_slice(self, key, value):

C:\users\alex\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2421 
   2422         self._ensure_valid_index(value)
-> 2423         value = self._sanitize_column(key, value)
   2424         NDFrame._set_item(self, key, value)
   2425 

C:\users\alex\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _sanitize_column(self, key, value)
   2576 
   2577             # turn me into an ndarray
-> 2578             value = _sanitize_index(value, self.index, copy=False)
   2579             if not isinstance(value, (np.ndarray, Index)):
   2580                 if isinstance(value, list) and len(value) > 0:

C:\users\alex\Anaconda2\lib\site-packages\pandas\core\series.pyc in _sanitize_index(data, index, copy)
   2768 
   2769     if len(data) != len(index):
-> 2770         raise ValueError('Length of values does not match length of ' 'index')
   2771 
   2772     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

细胞[0]，细胞1并且 cells[2] 应该有值。

正如我提到的，前面还有一个表由以下 html 代码标识:

<table width="100%" border="0" cellspacing="0" cellpadding="4">

我运行了一个没有指定表的函数版本——我没有观察到 html 文件中有两个表。在那个版本中我没有遇到这种类型的错误。我在第 6 行收到了错误消息，指出 try 语句的标识不正确——我不明白这一点——并且在第 9 行收到错误消息，我试图重新排列数据帧的列——我也无法理解。/p>

我们将不胜感激您的建议。

最佳答案

right_table 是一个 ResultSet 实例(基本上是表示元素的 Tag 实例的列表)，它没有 findAll( ) 或 find_all() 方法。

相反，如果您有多个元素，则可以循环遍历 right_table 中的元素:

right_table = soup.find_all("table", summary_ = "formatting")

for table in right_table:
    for row in table.findAll("tr"):
        # ...

或者，如果只有一个，请使用 find():

right_table = soup.find("table", summary_ = "formatting")

或者，使用单个 CSS 选择器:

for row in soup.select("table[summary=formatting] tr"):
    # ...

关于python - 函数内 Beautiful Soup 中的“ResultSet”对象没有属性 'findAll' 错误消息，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41592627/

25

4

0

文章推荐： f# - 如何在可观察对象之间创建依赖关系？

文章推荐： javascript - 不使用 CORS 访问后端 API

文章推荐： R - 如何从多个匹配项中替换字符串(在数据框中)

文章推荐： javascript - 如何在 Vue.js 中有条件地渲染元素？

详解C语言sscanf()函数、vsscanf()函数、vscanf()函数
C语言sscanf()函数：从字符串中读取指定格式的数据头文件： ?
php - 如何解释at()函数； substr()函数;伪代码中的exist()函数
最近，我有一个关于工作预评估的问题，即使查询了每个功能的工作原理，我也不知道如何解决。这是一个伪代码。下面是一个名为foo()的函数，该函数将被传递一个值并返回一个值。如果将以下值传递给foo函数，
VBS教程：函数-CStr 函数
CStr 函数返回表达式，该表达式已被转换为 String 子类型的 Variant。 CStr(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CSng 函数
CSng 函数返回表达式，该表达式已被转换为 Single 子类型的 Variant。 CSng(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-CreateObject 函数
CreateObject 函数创建并返回对 Automation 对象的引用。 CreateObject(servername.typename [, location]) 参数 serv
VBS教程：函数-Cos 函数
Cos 函数返回某个角的余弦值。 Cos(number) number 参数可以是任何将某个角表示为弧度的有效数值表达式。说明 Cos 函数取某个角并返回直角三角形两边的比值。此比值是
VBS教程：函数-CLng 函数
CLng 函数返回表达式，此表达式已被转换为 Long 子类型的 Variant。 CLng(expression) expression 参数是任意有效的表达式。说明通常，您可以使
VBS教程：函数-CInt 函数
CInt 函数返回表达式，此表达式已被转换为 Integer 子类型的 Variant。 CInt(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-Chr 函数
Chr 函数返回与指定的 ANSI 字符代码相对应的字符。 Chr(charcode) charcode 参数是可以标识字符的数字。说明从 0 到 31 的数字表示标准的不可打印的
VBS教程：函数-CDbl 函数
CDbl 函数返回表达式，此表达式已被转换为 Double 子类型的 Variant。 CDbl(expression) expression 参数是任意有效的表达式。说明通常，您可
VBS教程：函数-CDate 函数
CDate 函数返回表达式，此表达式已被转换为 Date 子类型的 Variant。 CDate(date) date 参数是任意有效的日期表达式。说明 IsDate 函数用于判断 d
VBS教程：函数-CCur 函数
CCur 函数返回表达式，此表达式已被转换为 Currency 子类型的 Variant。 CCur(expression) expression 参数是任意有效的表达式。说明通常，
VBS教程：函数-CByte 函数
CByte 函数返回表达式，此表达式已被转换为 Byte 子类型的 Variant。 CByte(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CBool 函数
CBool 函数返回表达式，此表达式已转换为 Boolean 子类型的 Variant。 CBool(expression) expression 是任意有效的表达式。说明如果 ex
VBS教程：函数-Atn 函数
Atn 函数返回数值的反正切值。 Atn(number) number 参数可以是任意有效的数值表达式。说明 Atn 函数计算直角三角形两个边的比值 (number) 并返回对应角的弧
VBS教程：函数-Asc 函数
Asc 函数返回与字符串的第一个字母对应的 ANSI 字符代码。 Asc(string) string 参数是任意有效的字符串表达式。如果 string 参数未包含字符，则将发生运行时错误。
VBS教程：函数-Array 函数
Array 函数返回包含数组的 Variant。 Array(arglist) arglist 参数是赋给包含在 Variant 中的数组元素的值的列表（用逗号分隔）。如果没有指定此参数，则
VBS教程：函数-Abs 函数
Abs 函数返回数字的绝对值。 Abs(number) number 参数可以是任意有效的数值表达式。如果 number 包含 Null，则返回 Null；如果是未初始化变量，则返回 0。
VBS教程：函数-FormatPercent 函数
FormatPercent 函数返回表达式，此表达式已被格式化为尾随有 % 符号的百分比（乘以 100 ）。 FormatPercent(expression[,NumDigitsAfterD
VBS教程：函数-FormatNumber 函数
FormatNumber 函数返回表达式，此表达式已被格式化为数值。 FormatNumber( expression [,NumDigitsAfterDecimal [,Inc

首页

博学

6Ren·AI

商城

python - 函数内 Beautiful Soup 中的“ResultSet”对象没有属性 'findAll' 错误消息