gpt4 book ai didi

python - 如何使用 beautifulSoup 从 标签中单独抓取数据?

转载 作者:太空宇宙 更新时间:2023-11-03 10:48:27 25 4
gpt4 key购买 nike

我正在尝试从 elections.in 中抓取数据.有同一个类(class)的三张表。以下是网站的 HTML

<h3 class="blmap">17th General (Lok Sabha) Election Results 2019 – State Wise</h3>

<table class="tableizer-table">

<thead><tr class="tableizer-firstrow"><th>State</th><th>Party</th><th>Number of Seats</th></tr></thead><tbody>

<tr><td>Andaman & Nicobar Islands</td><td>Indian National Congress</td><td>1</td></tr>

<tr><td>Andhra Pradesh</td><td>Yuvajana Sramika Rythu Congress Party</td><td>22</td></tr>

<tr><td>Andhra Pradesh</td><td>Telugu Desam</td><td>3</td></tr>

<tr><td>Arunachal Pradesh</td><td>Bharatiya Janata Party</td><td>2</td></tr>

<tr><td>Assam</td><td>Bharatiya Janata Party</td><td>9</td></tr>

<tr><td>Assam</td><td>Indian National Congress</td><td>3</td></tr>

<tr><td>Assam</td><td>All India United Democratic Front</td><td>1</td></tr>

我能够获取数据并且看起来像这样,

    StatePartyNumber of Seats
Andaman & Nicobar IslandsIndian National Congress1
Andhra PradeshYuvajana Sramika Rythu Congress Party22
Andhra PradeshTelugu Desam3
Arunachal PradeshBharatiya Janata Party2
AssamBharatiya Janata Party9
AssamIndian National Congress3
AssamAll India United Democratic Front1
AssamIndependent1
BiharBharatiya Janata Party17

我想要如下输出,

    State,Party,Number of Seats
Andaman & Nicobar Islands, Indian National Congress,1
Andhra Pradesh,Yuvajana Sramika Rythu Congress Party,22

或作为列表。

这行代码给出了上面的输出

soup.find_all('table')[1].get_text()

这是我的代码,Github

请建议如何实现这一点

谢谢。

最佳答案

如果您尝试解析 <table>标签,去找 Pandas .read_html() .它为您完成了大部分繁重的工作。它将返回一个数据帧列表。您引用的表是第 3 个表(因此索引位置 2)

import pandas as pd

url="http://www.elections.in/"
tables = pd.read_html(url)

输出:

print (tables[2].to_string())
State Party Number of Seats
0 Andaman & Nicobar Islands Indian National Congress 1
1 Andhra Pradesh Yuvajana Sramika Rythu Congress Party 22
2 Andhra Pradesh Telugu Desam 3
3 Arunachal Pradesh Bharatiya Janata Party 2
4 Assam Bharatiya Janata Party 9
5 Assam Indian National Congress 3
6 Assam All India United Democratic Front 1
7 Assam Independent 1
8 Bihar Bharatiya Janata Party 17
9 Bihar Janata Dal (United) 16
10 Bihar Lok Jan Shakti Party 6
11 Bihar Indian National Congress 1
12 Chandigarh Bharatiya Janata Party 1
13 Chhattisgarh Bharatiya Janata Party 9
14 Chhattisgarh Indian National Congress 2
15 Dadra & Nagar Haveli Independent 1
16 Daman & Diu Bharatiya Janata Party 1
17 Goa Bharatiya Janata Party 1
18 Goa Indian National Congress 1
19 Gujarat Bharatiya Janata Party 26
20 Haryana Bharatiya Janata Party 10
21 Himachal Pradesh Bharatiya Janata Party 4
22 Jammu & Kashmir Bharatiya Janata Party 3
23 Jammu & Kashmir Jammu & Kashmir National Conference 3
24 Jharkhand Bharatiya Janata Party 11
25 Jharkhand Ajsu Party 1
26 Jharkhand Indian National Congress 1
27 Jharkhand Jharkhand Mukti Morcha 1
28 Karnataka Bharatiya Janata Party 25
29 Karnataka Independent 1
30 Karnataka Indian National Congress 1
31 Karnataka Janata Dal (Secular) 1
32 Kerala Indian National Congress 15
33 Kerala Indian Union Muslim League 2
34 Kerala Communist Party Of India (Marxist) 1
35 Kerala Kerala Congress (M) 1
36 Kerala Revolutionary Socialist Party 1
37 Lakshadweep Nationalist Congress Party 1
38 Madhya Pradesh Bharatiya Janata Party 28
39 Madhya Pradesh Indian National Congress 1
40 Maharashtra Bharatiya Janata Party 23
41 Maharashtra Shivsena 18
42 Maharashtra Nationalist Congress Party 4
43 Maharashtra All India Majlis-E-Ittehadul Muslimeen 1
44 Maharashtra Independent 1
45 Maharashtra Indian National Congress 1
46 Manipur Bharatiya Janata Party 1
47 Manipur Naga Peoples Front 1
48 Meghalaya Indian National Congress 1
49 Meghalaya National People'S Party 1
50 Mizoram Mizo National Front 1
51 Nagaland Nationalist Democratic Progressive Party 1
52 NCT OF Delhi Bharatiya Janata Party 7
53 Odisha Biju Janata Dal 12
54 Odisha Bharatiya Janata Party 8
55 Odisha Indian National Congress 1
56 Puducherry Indian National Congress 1
57 Punjab Indian National Congress 8
58 Punjab Bharatiya Janata Party 2
59 Punjab Shiromani Akali Dal 2
60 Punjab Aam Aadmi Party 1
61 Rajasthan Bharatiya Janata Party 24
62 Rajasthan Rashtriya Loktantrik Party 1
63 Sikkim Sikkim Krantikari Morcha 1
64 Tamil Nadu Dravida Munnetra Kazhagam 23
65 Tamil Nadu Indian National Congress 8
66 Tamil Nadu Communist Party Of India 2
67 Tamil Nadu Communist Party Of India (Marxist) 2
68 Tamil Nadu All India Anna Dravida Munnetra Kazhagam 1
69 Tamil Nadu Indian Union Muslim League 1
70 Tamil Nadu Viduthalai Chiruthaigal Katchi 1
71 Telangana Telangana Rashtra Samithi 9
72 Telangana Bharatiya Janata Party 4
73 Telangana Indian National Congress 3
74 Telangana All India Majlis-E-Ittehadul Muslimeen 1
75 Tripura Bharatiya Janata Party 2
76 Uttar Pradesh Bharatiya Janata Party 62
77 Uttar Pradesh Bahujan Samaj Party 10
78 Uttar Pradesh Samajwadi Party 5
79 Uttar Pradesh Apna Dal (Soneylal) 2
80 Uttar Pradesh Indian National Congress 1
81 Uttarakhand Bharatiya Janata Party 5
82 West Bengal All India Trinamool Congress 22
83 West Bengal Bharatiya Janata Party 18
84 West Bengal Indian National Congress

2

要使用 BeautifulSoup 实现此目的,您必须遍历每一行(标签 <tr>),然后遍历每一行的每个数据单元格标签(<td>),然后将其附加到列表或数据框中,或者如何无论你想存储它。

所以像这样:

import requests
import os
from bs4 import BeautifulSoup

url="http://www.elections.in/"

r=requests.get(url).content
htmlDoc=r.decode("utf-8")

soup = BeautifulSoup(htmlDoc, 'html.parser')

table = soup.find_all('table')[2]
rows = table.find_all('tr')

headers = table.find_all('th')
headers = [ each.text for each in headers ]

list_of_rows = []
for row in rows:
data = row.find_all('td')
if data != []:
data = [ each.text for each in data ]
list_of_rows.append(data)

输出:

print (headers)
['State', 'Party', 'Number of Seats']

print (list_of_rows)
[['Andaman & Nicobar Islands', 'Indian National Congress', '1'], ['Andhra Pradesh', 'Yuvajana Sramika Rythu Congress Party', '22'], ['Andhra Pradesh', 'Telugu Desam', '3'], ['Arunachal Pradesh', 'Bharatiya Janata Party', '2'], ['Assam', 'Bharatiya Janata Party', '9'], ['Assam', 'Indian National Congress', '3'], ['Assam', 'All India United Democratic Front', '1'], ['Assam', 'Independent', '1'], ['Bihar', 'Bharatiya Janata Party', '17'], ['Bihar', 'Janata Dal (United)', '16'], ['Bihar', 'Lok Jan Shakti Party', '6'], ['Bihar', 'Indian National Congress', '1'], ['Chandigarh', 'Bharatiya Janata Party', '1'], ['Chhattisgarh', 'Bharatiya Janata Party', '9'], ['Chhattisgarh', 'Indian National Congress', '2'], ['Dadra & Nagar Haveli', 'Independent', '1'], ['Daman & Diu', 'Bharatiya Janata Party', '1'], ['Goa', 'Bharatiya Janata Party', '1'], ['Goa', 'Indian National Congress', '1'], ['Gujarat', 'Bharatiya Janata Party', '26'], ['Haryana', 'Bharatiya Janata Party', '10'], ['Himachal Pradesh', 'Bharatiya Janata Party', '4'], ['Jammu & Kashmir', 'Bharatiya Janata Party', '3'], ['Jammu & Kashmir', 'Jammu & Kashmir National Conference', '3'], ['Jharkhand', 'Bharatiya Janata Party', '11'], ['Jharkhand', 'Ajsu Party', '1'], ['Jharkhand', 'Indian National Congress', '1'], ['Jharkhand', 'Jharkhand Mukti Morcha', '1'], ['Karnataka', 'Bharatiya Janata Party', '25'], ['Karnataka', 'Independent', '1'], ['Karnataka', 'Indian National Congress', '1'], ['Karnataka', 'Janata Dal (Secular)', '1'], ['Kerala', 'Indian National Congress', '15'], ['Kerala', 'Indian Union Muslim League', '2'], ['Kerala', 'Communist Party Of India (Marxist)', '1'], ['Kerala', 'Kerala Congress (M)', '1'], ['Kerala', 'Revolutionary Socialist Party', '1'], ['Lakshadweep', 'Nationalist Congress Party', '1'], ['Madhya Pradesh', 'Bharatiya Janata Party', '28'], ['Madhya Pradesh', 'Indian National Congress', '1'], ['Maharashtra', 'Bharatiya Janata Party', '23'], ['Maharashtra', 'Shivsena', '18'], ['Maharashtra', 'Nationalist Congress Party', '4'], ['Maharashtra', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Maharashtra', 'Independent', '1'], ['Maharashtra', 'Indian National Congress', '1'], ['Manipur', 'Bharatiya Janata Party', '1'], ['Manipur', 'Naga Peoples Front', '1'], ['Meghalaya', 'Indian National Congress', '1'], ['Meghalaya', "National People'S Party", '1'], ['Mizoram', 'Mizo National Front', '1'], ['Nagaland', 'Nationalist Democratic Progressive Party', '1'], ['NCT OF Delhi', 'Bharatiya Janata Party', '7'], ['Odisha', 'Biju Janata Dal', '12'], ['Odisha', 'Bharatiya Janata Party', '8'], ['Odisha', 'Indian National Congress', '1'], ['Puducherry', 'Indian National Congress', '1'], ['Punjab', 'Indian National Congress', '8'], ['Punjab', 'Bharatiya Janata Party', '2'], ['Punjab', 'Shiromani Akali Dal', '2'], ['Punjab', 'Aam Aadmi Party', '1'], ['Rajasthan', 'Bharatiya Janata Party', '24'], ['Rajasthan', 'Rashtriya Loktantrik Party', '1'], ['Sikkim', 'Sikkim Krantikari Morcha', '1'], ['Tamil Nadu', 'Dravida Munnetra Kazhagam', '23'], ['Tamil Nadu', 'Indian National Congress', '8'], ['Tamil Nadu', 'Communist Party Of India', '2'], ['Tamil Nadu', 'Communist Party Of India (Marxist)', '2'], ['Tamil Nadu', 'All India Anna Dravida Munnetra Kazhagam', '1'], ['Tamil Nadu', 'Indian Union Muslim League', '1'], ['Tamil Nadu', 'Viduthalai Chiruthaigal Katchi', '1'], ['Telangana', 'Telangana Rashtra Samithi', '9'], ['Telangana', 'Bharatiya Janata Party', '4'], ['Telangana', 'Indian National Congress', '3'], ['Telangana', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Tripura', 'Bharatiya Janata Party', '2'], ['Uttar Pradesh', 'Bharatiya Janata Party', '62'], ['Uttar Pradesh', 'Bahujan Samaj Party', '10'], ['Uttar Pradesh', 'Samajwadi Party', '5'], ['Uttar Pradesh', 'Apna Dal (Soneylal)', '2'], ['Uttar Pradesh', 'Indian National Congress', '1'], ['Uttarakhand', 'Bharatiya Janata Party', '5'], ['West Bengal', 'All India Trinamool Congress', '22'], ['West Bengal', 'Bharatiya Janata Party', '18'], ['West Bengal', 'Indian National Congress', '2']]

但正如我所说,pandas 会用 .read_html() 为您做到这一点

关于python - 如何使用 beautifulSoup 从 <td> 标签中单独抓取数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56326397/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com