gpt4 book ai didi

python - 如何打印具有相似字符串的值?

转载 作者:行者123 更新时间:2023-12-01 08:03:19 24 4
gpt4 key购买 nike

我的目标是读取一个大的 csv 文件并打印出所有类似的值,因为它都是关于酒店的,为了简单起见,我将在此代码中列出一个字典列表:

S1 = [{'name': 'Holiday Inn A','price': '552'},
{'name': 'Holiday Inn B','price': '568'},
{'name': 'Holiday Inn C','price': '589'},
{'name': 'Grand Palace','price': '768'}
and so on...]

我的意思是我想打印出所有名称为“Holiday Inn”的值,这是我想要的结果:

Holiday Inn A
Holiday Inn B
Holiday Inn C

这是我的代码:

import csv

name = []
value = []
linked = []
a = []

def filereader():
line_count = 0
with open('hotelRev.csv','r', encoding ='utf-8') as fileIn:
reader = csv.reader(fileIn)
for row in reader:
line_count = line_count + 1
if line_count == 1:
name.append(row)
else:
value.append(row)

for x in name:
for y in value:
linked.append(dict(zip(x,y)))

filereader()
for row in linked:
a.append(row['name'])

b = sorted(set(a))

for row in linked:
print(row['name']['Holiday Inn'])

显然这不起作用,所以有人知道如何做到这一点吗?

edit-1:我所说的“类似”是指将所有假日酒店元素分类为一个大组,以便更容易调用和打印。

来自数据集本身的直接示例:

Holiday Inn Express & Suites Austin South                             
Holiday Inn Express & Suites Baton Rouge East
Holiday Inn Express & Suites Bethlehem
Holiday Inn Express & Suites Bloomington
Holiday Inn Express & Suites Butte
Holiday Inn Express & Suites Carmel-north Indianapolis
Holiday Inn Express & Suites Carpinteria
Holiday Inn Express & Suites Columbus - Polaris Parkway
Holiday Inn Express & Suites Columbus Univ Area - Osu
Holiday Inn Express & Suites Denver Northeast - Brighton

如果可能的话,我很想找到一种方法以尽可能少的行数打印它们

最佳答案

这是使用集合的基本解决方案。我认为这对于非常大的数据集来说效率不高,但可以引用它来创建有效的解决方案。

import pandas as pd
import re

df = pd.read_csv('HotelNames.csv')

search_terms = input('Enter search terms: ')
#Convert to lower case
search_terms = search_terms.lower()
#Remove special characters except space
search_terms = re.sub(r"[^a-zA-Z0-9]+", ' ', search_terms)

#Make a list of words from the string
temp = search_terms.split(' ')

search_set = set()
for i in range(len(temp)):
#Make a set of unique words
search_set.add(temp[i])

for i in range(len(df)):

t = re.sub(r"[^a-zA-Z0-9]+", ' ', df.iloc[i][0])
t = t.lower()
temp = t.split(' ')

hotel_set = set()
for j in range(len(temp)):
hotel_set.add(temp[j])

#Find whether the searched terms are a subset of the hotel name in that particular row
if(search_set.issubset(hotel_set)):
print(df.iloc[i][0])

HotelNames.csv 目前包含 1 列,即酒店名称。

关于python - 如何打印具有相似字符串的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55642447/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com