gpt4 book ai didi

python - 解析txt的特定区域,与字符串列表进行比较,然后生成由匹配项组成的新列表

转载 作者:行者123 更新时间:2023-12-01 00:02:45 26 4
gpt4 key购买 nike

我正在尝试执行以下操作:

  1. 通读文本文件的特定部分(有已知的起点和终点)
  2. 在阅读这些行时,检查某个单词是否与我已包含在列表中的单词匹配
  3. 如果检测到匹配项,则将该特定单词添加到新列表中

我已经能够阅读文本并从中获取我需要的其他数据,但到目前为止我还无法执行上述操作。

我尝试实现以下示例:Python - Search Text File For Any String In a List但我没能正确读取它。

我还尝试调整以下内容:https://www.geeksforgeeks.org/python-finding-strings-with-given-substring-in-list/但我同样没有成功。

这是我的一些代码:

import re
from itertools import islice
import os

# list of all countries
oneCountries = "Afghanistan, Albania, Algeria, Andorra, Angola, Antigua & Deps, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina, Burma, Burundi, Cambodia, Cameroon, Canada, Cape Verde, Central African Rep, Chad, Chile, China, Republic of China, Colombia, Comoros, Democratic Republic of the Congo, Republic of the Congo, Costa Rica,, Croatia, Cuba, Cyprus, Czech Republic, Danzig, Denmark, Djibouti, Dominica, Dominican Republic, East Timor, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Fiji, Finland, France, Gabon, Gaza Strip, The Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Holy Roman Empire, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Republic of Ireland, Israel, Italy, Ivory Coast, Jamaica, Japan, Jonathanland, Jordan, Kazakhstan, Kenya, Kiribati, North Korea, South Korea, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mount Athos, Mozambique, Namibia, Nauru, Nepal, Newfoundland, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Ottoman Empire, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Prussia, Qatar, Romania, Rome, Russian Federation, Rwanda, St Kitts & Nevis, St Lucia, Saint Vincent & the Grenadines, Samoa, San Marino, Sao Tome & Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Spain, Sri Lanka, Sudan, Suriname, Swaziland, Sweden, Switzerland, Syria, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad & Tobago, Tunisia, Turkey, Turkmenistan, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Uzbekistan, Vanuatu, Vatican City, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe"
countries = oneCountries.split(",")

path = "C:/Users/me/Desktop/read.txt"
thefile = open(path, errors='ignore')

countryParsing = False
for line in thefile:
line = line.strip()
# if line.startswith("Submitting Author:"):
# if re.match(r"Submitting Author:", line):
# print("blahblah1")
# countryParsing = True
# if countryParsing == True:
# print("blahblah2")
#
# res = [x for x in line if re.search(countries, x)]
# print("blah blah3: " + str(res))
# elif re.match(r"Running Head:", line):
# countryParsing = False
# if countryParsing == True:
# res = [x for x in line if re.search(countries, x)]
# print("blah blah4: " + str(res))


# for x in countries:
# if x in thefile:
# print("a country is: " + x)
# if any(s in line for s in countries):
# listOfAuthorCountries = listOfAuthorCountries + s + ", "
# if re.match(f"Submitting Author:, line"):

#commented out 行是我尝试过但未能正常工作的代码版本。

根据要求,这是我尝试从中获取数据的文本文件的示例。我已对其进行修改以删除敏感信息,但在这种特殊情况下,“新列表”应附加一定数量的“法国”条目:

    txt above....
Submitting Author:

asdf, asdf (proxy)
France
asdfasdf
blah blah
asdfasdf

asdf, Provence-Alpes-Côte d'Azu 13354
France

blah blah
France
asdf
Running Head:
...more text below

最佳答案

根据您提出的关于您想要完成的任务的三点以及我从您的代码中理解的内容(可能不是您想要的),我建议:

# list of all countries
countries = "Afghanistan, Albania, Algeria, Andorra, Angola, Antigua & Deps, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina, Burma, Burundi, Cambodia, Cameroon, Canada, Cape Verde, Central African Rep, Chad, Chile, China, Republic of China, Colombia, Comoros, Democratic Republic of the Congo, Republic of the Congo, Costa Rica, Croatia, Cuba, Cyprus, Czech Republic, Danzig, Denmark, Djibouti, Dominica, Dominican Republic, East Timor, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Fiji, Finland, France, Gabon, Gaza Strip, The Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Holy Roman Empire, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Republic of Ireland, Israel, Italy, Ivory Coast, Jamaica, Japan, Jonathanland, Jordan, Kazakhstan, Kenya, Kiribati, North Korea, South Korea, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mount Athos, Mozambique, Namibia, Nauru, Nepal, Newfoundland, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Ottoman Empire, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Prussia, Qatar, Romania, Rome, Russian Federation, Rwanda, St Kitts & Nevis, St Lucia, Saint Vincent & the Grenadines, Samoa, San Marino, Sao Tome & Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Spain, Sri Lanka, Sudan, Suriname, Swaziland, Sweden, Switzerland, Syria, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad & Tobago, Tunisia, Turkey, Turkmenistan, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Uzbekistan, Vanuatu, Vatican City, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe"
countries = countries.split(",")
countries = [c.strip() for c in countries]

filename = "read.txt"
filehandle = open(filename, errors='ignore')
my_other_list = []
toParse = False
for line in filehandle:
line = line.strip()
if line.startswith("Submitting Author:"):
toParse = True
continue
elif line.startswith("Running Head:"):
toParse = False
continue
elif toParse:
for c in countries:
if c in line:
my_other_list.append(c)

编辑摘要

  1. 改编代码以处理所提供的文本示例。

  2. 修复了国家/地区列表(最初哥斯达黎加后面有两个逗号)。

关于python - 解析txt的特定区域,与字符串列表进行比较,然后生成由匹配项组成的新列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60224805/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com