gpt4 book ai didi

Python 3 抓取黄页

转载 作者:行者123 更新时间:2023-11-28 22:31:48 25 4
gpt4 key购买 nike

我试图从黄页上抓取数据,但我遇到了无法获取每个公司名称和地址/电话的文本的问题。我正在使用下面的代码,我哪里出错了?我正在尝试打印每个业务的文本,但打印出来只是为了在我测试时立即看到它,但一旦完成,我将把数据保存到 csv。

import csv
import requests
from bs4 import BeautifulSoup

#dont worry about opening this file
"""with open('cities_louisiana.csv','r') as cities:
lines = cities.read().splitlines()
cities.close()"""

for city in lines:
print(city)
url = "http://www.yellowpages.com/search? search_terms=businesses&geo_location_terms=amite+LA&page="+str(count)

for city in lines:
for x in range (0, 50):
print("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=amite+LA&page="+str(x))
page = requests.get("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=amite+LA&page="+str(x))
soup = BeautifulSoup(page.text, "html.parser")
name = soup.find_all("div", {"class": "v-card"})
for name in name:
try:
print(name.contents[0]).find_all(class_="business-name").text
#print(name.contents[1].text)
except:
pass

最佳答案

您应该遍历搜索结果,然后为每个搜索结果找到公司名称(具有“business-name”类的元素)和地址(具有“adr”类的元素):

for result in soup.select(".search-results .result"):
name = result.select_one(".business-name").get_text(strip=True, separator=" ")
address = result.select_one(".adr").get_text(strip=True, separator=" ")

print(name, address)

.select().select_one() 很方便 CSS selector methods .

关于Python 3 抓取黄页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41405185/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com