gpt4 book ai didi

python - 类型错误 : 'list' object is not callable when web scraping to append lists/values to a column in csv file

转载 作者:行者123 更新时间:2023-12-01 01:09:11 25 4
gpt4 key购买 nike

当调用我之前定义的函数时,我遇到了 TypeError: 'list' object is not callable in a for 循环。我想将函数附加/注入(inject)到我的 for 循环中,以自动按列填充 csv 行

import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time
import csv

# copy and paste the url from indeed using your search term
URL = 'https://www.indeed.com/jobs?q=data+science+summer+intern&l=New+York'

# conducting a request of the stated URL above:
page = requests.get(URL)

# specifying a desired format of “page” using the html parser - this allows python to read the various components of the page, rather than treating it as one long string.
soup = BeautifulSoup(page.text, 'html.parser')

# printing soup in a more structured tree format that makes for easier reading
print(soup.prettify())

此 extract_job_title_from_result() 函数从 Indeed 中获取“职位名称”并将其附加到“职位”列表中。

def extract_job_title_from_result(soup): 
jobs = []
for div in soup.find_all(name='div', attrs={'class':'row'}):
for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
jobs.append(a['title'])
return(jobs)

extract_job_title_from_result = extract_job_title_from_result(soup)
print('extract_job_title_from_result is: ', extract_job_title_from_result)

output: extract_job_title_from_result is: ['Data Engineer Summer Intern', 'Data Science Summer Intern', 'Data Scientist Summer Intern', 'Statistical Research and Data Science Intern', 'Data Scientist/Data Analytics Intern - Summer 2019', '2019 Summer Internship - Data Science Internship, Baseball Data', 'Data Science Summer 2019 Internship', 'Intern, Data Science', 'Data Science Intern (Social Media Analysis)', 'Data Science Intern']

# Set max result per city
max_results_per_city = 100
city_set = ['New+York','Chicago','San+Francisco', 'Austin', 'Seattle', 'Los+Angeles', 'Philadelphia', 'Dallas', 'Pittsburgh', 'Denver', 'Miami', 'Washington+DC','Jersey+City', 'Princeton']
columns = ['city', 'job_title', 'company_name', 'location', 'summary', 'salary']
sample_df = pd.DataFrame(columns = columns)
sample_df

output for sample_df as column header but no data yet = city job_title company_name location summary salary

现在,我正在尝试从 Indeed 中进行网络抓取和提取数据。我已经编写了有效的函数,并且可以使用这些函数按 csv 中的列名称保存/附加到列中。

我想抓取每个城市 100 个结果,并使用我写入 csv 文件的函数保存/附加这些数据。

for city in city_set:
for start in range(0, max_results_per_city, 10):
#ensuring at least 1 second between page grabs
time.sleep(1)
#soup = BeautifulSoup(page.text, 'lxml', from_encoding='utf-8')
sample_df['job_title'] = extract_job_title_from_result(soup)

### Ignore the below functions. They worked individually but not here in this for loop. I'm using a function to try to make it work first before appending all functions to csv by column name

#extract_company_from_result(soup)
#extract_location_from_result(soup)
#extract_salary_from_result(soup)
#extract_summary_from_result(soup)
#sample_df.loc[num] = job_post

sample_df.to_csv('/Users/KingKong1/AnacondaProjects/testing1.csv', encoding='utf-8')

我从 **sample_df['job_title'] = extract_job_title_from_result(soup)** 得到“TypeError: 'list' object is not callable”

最佳答案

extract_job_title_from_result = extract_job_title_from_result(soup) 中,您已将函数 extract_job_title_from_result 替换为其结果(一个列表)。

因此,下次您尝试调用它时,extract_job_title_from_result 不再是函数的名称,而是引用此列表。

使用不同的名称,例如:

job_title = extract_job_title_from_result(soup)
print('job_title is: ', job_title)

关于python - 类型错误 : 'list' object is not callable when web scraping to append lists/values to a column in csv file,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55016473/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com