gpt4 book ai didi

python - 为什么我抓取的 Excel 文件中有空单元格?

转载 作者:太空宇宙 更新时间:2023-11-04 02:29:35 27 4
gpt4 key购买 nike

我正在尝试使用 Python 和 bs4 将开发人员工作从 indeed.nl 抓取到 Excel。一切正常,但是当我在 Excel 中打开它时,作业之间有额外的行单元格 Excel file

谁能看出我做错了什么?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.indeed.nl/jobs?q=developer&l='

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

#grabs each job
containers = page_soup.findAll("div",{"class":"row"})

filename = "indeedjobs.csv"

f = open(filename, "w")

headers = "Company; Job; City\n"
f.write(headers)

for container in containers:
jobtitle = container.a["title"]
city_container = container.findAll("span",{"class":"location"})
City_name = city_container[0].text
company_container = container.findAll("span",{"class":"company"})
company_name = company_container[0].text

print("Company: " + company_name)
print("Job: " + jobtitle)
print("City: " + City_name)

f.write(company_name + ";" + jobtitle + ";" + City_name + "\n")
f.close()

最佳答案

<span class="company">元素以换行符和一些空格开头。删除带有 .strip() 的那些.

您还可以考虑 csv module编写格式良好的 CSV 文件。该模块将帮助您正确转义特殊字符。

关于python - 为什么我抓取的 Excel 文件中有空单元格?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49527438/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com