gpt4 book ai didi

python - 将抓取的数据导出到具有特定列的 CSV

转载 作者:行者123 更新时间:2023-12-01 08:06:16 24 4
gpt4 key购买 nike

我的代码当前正在命令屏幕上打印结果。

期望的结果(参见所附屏幕截图):将最终输出写入 CSV 文件的“a2”列中并将 sku# 输出到“a1”列sku# 始终是 url 中第 5 个“/”之后的文本

这是代码

from bs4 import BeautifulSoup
import urllib.request
import csv
def get_bullets(url):
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,'lxml')
content = soup.find('div', class_='js-productHighlights product-highlights c28 fs14 js-close')
bullets = content.find_all('li', class_='top-section-list-item')
for bullet in bullets:
print(bullet.string)

get_bullets('https://www.bhphotovideo.com/c/product/1225875-REG/canon_1263c004_eos_80d_dslr_camera.html')

期望的结果:

enter image description here

谢谢!

最佳答案

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd


def get_bullets(url):
sku = url.split('/')[5]
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,'lxml')
content = soup.find('div', class_='js-productHighlights product-highlights c28 fs14 js-close')
bullets = content.find_all('li', class_='top-section-list-item')

bullets_text = '\n'.join([ bullet.text for bullet in bullets ])

temp_df = pd.DataFrame([[sku, bullets_text]], columns = ['sku','bullets'])
temp_df.to_csv('path/filename.csv', index=False)


get_bullets('https://www.bhphotovideo.com/c/product/1225875-REG/canon_1263c004_eos_80d_dslr_camera.html')

关于python - 将抓取的数据导出到具有特定列的 CSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55519762/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com