gpt4 book ai didi

python - 在python beautifulsoup中遍历多个div,输出到df然后csv

转载 作者:行者123 更新时间:2023-11-27 22:49:50 25 4
gpt4 key购买 nike

尝试为我学校的类(class)目录构建一个抓取器/解析器。第一步是将 Coursicle 数据库抓取到 csv,但我现在只能让它吐出第一行。

这是我正在尝试解析的 html 片段:

<div class="card back" style="display: block;">
<div class="addClass Back">
<i class="fa clicky fa-star Back"></i>
<i class="fa clicky fa-star-o Back"></i>
<i class="clicky icon-info-sign"></i>
</div>
<div class="courseNumberBack">
<span class="subject">ANTH</span> <span class="number">54</span>-<span class="section">001</span>
<div class="smallCourseInfo">

<span class="abbrevTitle">First-Year Seminar: The Indians' New Worlds: Southeastern Histories from 1200 to 1800</span>

</div>
</div>
<hr class="faddedLine">

<div class="courseNameBack"><div class="days">TuTh</div><br>

<div class="smallCourseInfo"> <div class="instructor">Clara Scarry</div></div>

<div class="time">3:30pm-4:45pm</div><br>
<div class="smallCourseInfo"> <div class="building">Alumni 203 </div></div>


<div class="genEds">HS US WB </div>


</div>

这是我的代码:

import pandas as pd
import os
import csv
import itertools
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("/Users/as9934/Desktop/schedule/wb.htm"), "lxml")

cardback = (soup.find('div', class_='card back'))
for courseNumberBack in cardback.find_all('div', class_='courseNumberBack'):
for subject in courseNumberBack.find_all('span', class_='subject'):
for subjects in subject:
print (subjects.string,",", end=' ')

for number in courseNumberBack.find_all('span', class_='number'):
for numbers in number:
print (numbers.string,",", end=' ')

for section in courseNumberBack.find_all('span', class_='section'):
for sections in section:
print(sections.string,",", end=' ')

for abbrevTitle in courseNumberBack.find_all('span', class_='abbrevTitle'):
for abbrevTitles in abbrevTitle:
print(abbrevTitles.string,",", end=' ')


for courseNameBack in cardback.find_all('div', class_='courseNameBack'):
for day in courseNameBack.find_all('div', class_='days'):
for days in day:
print(days.string,",", end=' ')

for instructor in courseNameBack.find_all('div', class_='instructor'):
for instructors in instructor:
print(instructors.string,",", end=' ')

for time in courseNameBack.find_all('div', class_='time'):
for times in time:
print(times.string,",", end=' ')

for building in courseNameBack.find_all('div', class_='building'):
for buildings in building:
print(buildings.string,",", end=' ')

for genEd in courseNameBack.find_all('div', class_='genEds'):
for genEds in genEd:
print(genEds.string, end=' ')

我试过这个:

cardback = (soup.find('div', class_='card back'))
result = dict(
zip(
[cardback.text for cardback in soup.select('span.subject')] ,
[cardback.text for cardback in soup.select('span.number')] ,
[cardback.text for cardback in soup.select('span.section')] ,
[cardback.text for cardback in soup.select('span.abbrevTitle')] ,
[cardback.text for carback in soup.select('div.days')] ,
[cardback.text for carback in soup.select('div.instructor')] ,
[cardback.text for carback in soup.select('div.time')] ,
[cardback.text for carback in soup.select('div.building')] ,
[cardback.text for carback in soup.select('div.genEds')]
)
)
print(result)

但是返回这个错误:

ValueError: dictionary update sequence element #0 has length 9; 2 is required

有人有什么想法吗?

最佳答案

在 python 中使用 print 时,您有 2 个特殊的 kw 参数:endsepend 参数正是您所需要的。它看起来像这样

print(something.text, ',', end=' ')

关于python - 在python beautifulsoup中遍历多个div,输出到df然后csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59499332/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com