gpt4 book ai didi

python - 获取 BeautifulSoup 中表的内容

转载 作者:太空狗 更新时间:2023-10-30 02:59:02 26 4
gpt4 key购买 nike

我在使用 BeautifulSoup 提取的网站上有下表这是网址(我也附上了图片enter image description here

理想情况下,我希望每个公司在 csv 中排成一行,但我将其排在不同的行中。请参阅所附图片。

enter image description here

我希望它像在“D”字段中一样,但我在 A1、A2、A3 中得到它...

这是我用来提取的代码:

def _writeInCSV(text):
print "Writing in CSV File"
with open('sara.csv', 'wb') as csvfile:
#spamwriter = csv.writer(csvfile, delimiter='\t',quotechar='\n', quoting=csv.QUOTE_MINIMAL)
spamwriter = csv.writer(csvfile, delimiter='\t',quotechar="\n")

for item in text:
spamwriter.writerow([item])

read_list=[]
initial_list=[]


url="http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
r=requests.get(url)
soup = BeautifulSoup(r._content, "html.parser")

#gdata_even=soup.find_all("td", {"class":"ms-rteTableEvenRow-3"})

gdata_even=soup.find_all("td", {"class":"ms-rteTable-default"})




for item in gdata_even:
print item.text.encode("utf-8")
initial_list.append(item.text.encode("utf-8"))
print ""

_writeInCSV(initial_list)

有人可以帮忙吗?

最佳答案

思路是这样的:

  • 阅读表格中的标题单元格
  • 读取表中的所有其他行
  • 压缩所有带标题的数据行单元格,生成字典列表
  • 使用csv.DictWriter()转储到csv

实现:

import csv
from pprint import pprint

from bs4 import BeautifulSoup
import requests

url = "http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

rows = soup.select("table.ms-rteTable-default tr")
headers = [header.get_text(strip=True).encode("utf-8") for header in rows[0].find_all("td")]

data = [dict(zip(headers, [cell.get_text(strip=True).encode("utf-8") for cell in row.find_all("td")]))
for row in rows[1:]]

# see what the data looks like at this point
pprint(data)

with open('sara.csv', 'wb') as csvfile:
spamwriter = csv.DictWriter(csvfile, headers, delimiter='\t', quotechar="\n")

for row in data:
spamwriter.writerow(row)

关于python - 获取 BeautifulSoup 中表的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32434378/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com