python - Web 抓取到 CSV 的问题 [AttributeError : 'str' object has no attribute 'text]-6ren

python - Web 抓取到 CSV 的问题 [AttributeError : 'str' object has no attribute 'text]

转载作者：行者123 更新时间：2023-12-04 03:37:29

我正在尝试构建一个自动网络抓取工具，我花了数小时观看 YT 视频和阅读这里的资料。编程新手(一个月前开始)和这个社区的新手...

因此，使用 VScode 作为我的 IDE，我遵循了实际上用作网络抓取工具的代码格式(python 和 selenium):

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

with open('job_scraping_multipe_pages.csv', 'w') as file:
    file.write("Job_title, Location, Salary, Contract_type, Job_description \n")
    
driver= webdriver.Chrome()
driver.get('https://www.jobsite.co.uk/')

driver.maximize_window()
time.sleep(1)

cookie= driver.find_element_by_xpath('//button[@class="accept-button-new"]')
try:
    cookie.click()
except:
    pass 

job_title=driver.find_element_by_id('keywords')
job_title.click()
job_title.send_keys('Software Engineer')
time.sleep(1)

location=driver.find_element_by_id('location')
location.click()
location.send_keys('Manchester')
time.sleep(1)

dropdown=driver.find_element_by_id('Radius')
radius=Select(dropdown)
radius.select_by_visible_text('30 miles')
time.sleep(1)

search=driver.find_element_by_xpath('//input[@value="Search"]')
search.click()
time.sleep(2)

for k in range(3):
    titles=driver.find_elements_by_xpath('//div[@class="job-title"]/a/h2')
    location=driver.find_elements_by_xpath('//li[@class="location"]/span')
    salary=driver.find_elements_by_xpath('//li[@title="salary"]')
    contract_type=driver.find_elements_by_xpath('//li[@class="job-type"]/span')
    job_details=driver.find_elements_by_xpath('//div[@title="job details"]/p')

    with open('job_scraping_multipe_pages.csv', 'a') as file:
        for i in range(len(titles)):
            file.write(titles[i].text + "," + location[i].text + "," + salary[i].text + "," + contract_type[i].text + ","+
                      job_details[i].text + "\n")

        
        next=driver.find_element_by_xpath('//a[@aria-label="Next"]')
        next.click()
    file.close()
driver.close()

成功了。然后我尝试为另一个网站复制结果。我没有点击“下一步”按钮，而是找到了一种方法使 URL 的结尾数字增加 1。但是我的问题来自代码的最后部分，给了我 AttributeError: 'str'对象没有属性“文本”。以下是使用 Python 和 Selenium 编写的我定位的网站 ( https://angelmatch.io/pitch_decks/5285 ) 的代码:

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

driver = webdriver.Chrome()


with open('pitchDeckResults2.csv', 'w' ) as file:
    file.write("Startup_Name, Startup_Description, Link_Deck_URL, Startup_Website, Pitch_Deck_PDF, Industries, Amount_Raised, Funding_Round, Year /n")




    for k in range(5285, 5287, 1):
        
        linkDeck = "https://angelmatch.io/pitch_decks/" + str(k)        

        driver.get(linkDeck)
        driver.maximize_window
        time.sleep(2)

        startupName = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]')
        startupDescription = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[3]/p[2]')
        startupWebsite = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[3]/a')
        pitchDeckPDF = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/button/a')
        industries = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[2]')
        amountRaised = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[1]/b')
        fundingRound = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[1]')
        year = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[2]/b')

        

        with open('pitchDeckResults2.csv', 'a') as file:
            for i in range(len(startupName)):
                file.write(startupName[i].text + "," + startupDescription[i].text + "," + linkDeck[i].text + "," + startupWebsite[i].text + "," + pitchDeckPDF[i].text + "," + industries[i].text + "," + amountRaised[i].text + "," + fundingRound[i].text + "," + year[i].text +"\n")

            time.sleep(1)

        file.close()

driver.close()

我将不胜感激任何帮助!我正在尝试使用这种技术将数据转换为 CSV!

最佳答案

老实说，你做得很好。唯一的事情以及为什么会出现错误，您正在尝试从字符串类型值中获取 .text 变量。 python 中的 str 类型没有任何文本变量。此外，您正在尝试通过 [i] 迭代它可以达到“列表索引超出范围”。异常(exception)。您要放置在 linkDeck[i].text 位置的内容可能是 page.title？还是什么？

顺便说一句，在使用 with open() 语句时不应该关闭文件。它是上下文管理器，在你离开文件后它就没有你了

将添加的列添加到 maxamize_window() 并删除 1 个文件打开，并仅添加链接:

import time

from selenium import webdriver

driver = webdriver.Chrome()
delimeter = ';'
with open('pitchDeckResults2.csv', 'w+') as _file:
    _l = ['Startup_Name', 'Startup_Description', 'Link_Deck_URL', 'Startup_Website', 'Pitch_Deck_PDF', 'Industries',
          'Amount_Raised', 'Funding_Round', 'Year \n']
    _file.write(delimeter.join(_l))
    for k in range(5285, 5287, 1):
        linkDeck = "https://angelmatch.io/pitch_decks/" + str(k)

        driver.get(linkDeck)
        time.sleep(1)

        startupName = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]')
        startupDescription = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[3]/p[2]')
        startupWebsite = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[3]/a')
        pitchDeckPDF = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/button/a')
        industries = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[2]')
        amountRaised = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[1]/b')
        fundingRound = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[1]')
        year = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[2]/b')

        all_elements = [startupName.text, startupDescription.text, linkDeck, startupWebsite.text, pitchDeckPDF.text,
                        industries.text, amountRaised.text, fundingRound.text, f"{year.text}\n"]
        _str = delimeter.join(all_elements)
        _file.write(_str)

driver.close()

如果我错过了什么，请告诉我

关于python - Web 抓取到 CSV 的问题 [AttributeError : 'str' object has no attribute 'text]，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66632851/

文章推荐： ruby-on-rails - 选择带有预加载的特定列

文章推荐： sql - 寻找 SQL 中的性能改进

c# - if((attributes and File Attributes.Hidden) == File Attributes.Hidden) { } 如何工作？
关于 this页面，我看到以下代码: if ((attributes & FileAttributes.Hidden) == FileAttributes.Hidden) 但我不明白为什么会变成这样。
attributes - pthread互斥锁的 “attribute”是什么？
函数pthread_mutex_init允许您指定指向属性的指针。但是我还没有找到关于pthread属性是什么的很好的解释。我一直只是提供NULL。这个论点有用吗？该文档，对于那些忘记它的人: PT
xml - 我怎样才能结合xsl :attribute and xsl:use-attribute-sets to conditionally use an attribute set?
我们有一个 xml 节点“item”，其属性为“style”，即“Header1”。但是，这种风格可以改变。我们有一个名为 Header1 的属性集，它定义了它在 PDF 中的外观，通过 xsl:fo
JavaScript: element.setAttribute(attribute,value) , element.attribute=value & element.[attribute]=value 不改变属性值
我的任务是在用户点击它时从输入框中删除占位符并使标签可见。如果用户未在其中再次填写任何内容，请放回占位符并使标签不可见。我可以隐藏它但不能重新分配它。我试过 element.setAttribute
attributes - ASP.NET 5 : Bind attribute with Include parameter - include is not a valid named attribute argument
我从文章中编写代码，并且有: public IActionResult Create([Bind(Include="Imie,Nazwisko,Stanowisko,Wiek")] Pracownik
attributes - 单点触控 : Understand Foundation Attributes
你能给我解释一下以下属性吗？ 1) [MonoTouch.Foundation.Register("SomeClass")] 这个属性是否只用于向IB注册类？以编程方式扩展 iOS 类时是否必须使用此
c++ - this.attribute 应该是 this->attribute 是什么意思
我正在编写一个 C++ 程序，在调试时我在执行以下函数: int CClass::do_something() { ... // I've put a breakpoint here } 我的 C
javascript - polymer 1.0 : Is there any way to use 'layout' as an attribute instead of as a CSS class or using Attribute serialization in the class attribute?
我已经在 polymer 0.5 中构建了我的应用程序。现在我已经将它更新到 polymer 1.0。对于响应式布局，我使用了一个布局属性，它使用 Polymer 0.5 中布局属性的自定义逻辑。
attributes - Jade : element attributes without value
我是使用 Jade 的新手——到目前为止它很棒。但是我需要发生的一件事是具有“itemscope”属性的元素: 我的 Jade 符是: header(itemscope, itemtype='ht
attributes - 为什么在 Chef 中使用普通属性(attribute.set[..])？
我正在研究一个厨师实现，有时在过去的地方使用了 attribute.set，attribute.default 会这样做。为了解决这个问题，我对 Chef 属性优先范式非常熟悉。我知道“正常”属性(使
HTML "data-attribute"与简单 "custom attribute"
我经常看到html data-attribute (s) 将特定值/参数添加到 html 元素，例如使用它们将按钮“链接”到要打开的模式对话框等的 Bootstrap。现在，我看到一个几乎著名的
ruby - self.attribute 与 @attribute 的优势？
假设如下: def create_new_salt self.salt = self.object_id.to_s + rand.to_s end 为什么使用“ self ”更好。而不是实例变量“
主干.js 访问模型中的模型属性 - this.attribute VS this.get ('attribute' )？
根据我的理解，Backbone.js 模型的属性应该通过以下方式声明为有点私有(private)的成员变量 this.set({ attributeName: attributeValue }) //
xml - 在Hive XML SerDe中使用 “Attribute to Attribute”映射
我有一个看起来像下面的XML文档: ... ... ... ...
JSF 复合 :attribute with f:attribute conversion error
我正在实现一个 JSF 组件，需要有条件地添加一些属性。这个问题类似于之前的 JSF: p:dataTable with f:attribute results in "argument type m
安卓市场发布: 'android:icon' attribute: attribute is not a string value
我正在尝试将应用程序发布到 Android 电子市场，但出现以下错误: W/ResourceType(16964): No known package when getting value for r
c++ - 玛雅编程 : Separating attributes into sections in the attribute editor
抱歉这么具体的应用程序，但我注意到另一篇关于 Maya 开发的回答很好的帖子。我刚刚为 Maya 编写了一个插件节点。它只是根据湍流函数杀死一堆粒子。湍流由许多可在属性编辑器中调整的属性驱动。在属
html - html元素中data-attribute=false与data-attribute ="false"有什么区别吗？
我在 html 元素中的数据属性为 Update .它具有数据属性的 bool 值。跟下面的元素Update有什么区别吗？因为数据属性用双引号引起来。 html是否支持 bool 值？最佳答案 b
c# - 错误 : "is not an attribute class" when using ConfigurationElementType attribute
我正在尝试为企业库 5.0 的异常处理 block 创建自定义异常处理程序。据我了解，我需要使用属性开始上课“[ConfigurationElementType(typeof(CustomHandle
css - [attribute~=value] 和 [attribute*=value] 的区别
我找不到这两个选择器之间的区别。两者似乎都做同样的事情，即根据包含给定字符串的特定属性值选择标签。对于 [attribute~=value] :http://www.w3schools.com/cs

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Web 抓取到 CSV 的问题 [AttributeError : 'str' object has no attribute 'text]