gpt4 book ai didi

python - 如何使用Python循环删除包含特定单词的行?

转载 作者:行者123 更新时间:2023-12-01 02:42:20 25 4
gpt4 key购买 nike

我想删除所有包含“Scheduled”的行并将其导出到新的 csv 文件。我的代码有什么问题吗?我没有收到任何错误消息,它运行没有问题,但没有任何反应。

这是我的代码:

def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
soup2=BeautifulSoup(html,"html.parser")
name = soup2.h2.string
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
temp_data.append(name)
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
newlist = filter(None, temp_data)
datatable.append(newlist)
print name
output.writerows(datatable)

def filter_unwanted_words():
unwanted_words = {'Scheduled'}
with open('output.csv', 'r') as f:
for line in f:
if set(line.split()).isdisjoint(unwanted_words):
yield line


def write_output():
with open('output2.csv', 'w') as f:
f.writelines((line for line in filter_unwanted_words()))

if __name__ == '__main__':
write_output()

resultcsv.close()
time.sleep(10)
browser.close()

我尝试使用这个 def filter_unwanted_words,但它不起作用。

数据表:picture

最佳答案

替代解决方案。考虑使用 Pandas 将其读取到 Dataframe。

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
df[df[2] != "Scheduled"] # filters with 2 being the column that has the value
df.to_csv("output.csv", header=False) # no headers

数据框如下所示:

    0       1   2
0 123 1 Scheduled
1 345 2 -

数据看起来像这样,“预定”被过滤掉:

    0       1   2
1 345 2 -
<小时/>

更通用的解决方案,过滤掉所有“预定”,无论它们位于何处:

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
mask = np.column_stack([df[col].astype(str).str.contains(r"Scheduled", na=False) for col in df])
df2 = df.loc[~mask.any(axis=1)]
df2.to_csv("output.csv", header=False) # no headers

关于python - 如何使用Python循环删除包含特定单词的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45540976/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com