gpt4 book ai didi

python - 为具有重复项的字符串列表生成唯一 ID

转载 作者:太空宇宙 更新时间:2023-11-03 14:09:22 24 4
gpt4 key购买 nike

我想为从文本文件读取的字符串生成 ID。如果字符串重复,我希望字符串的第一个实例的 ID 包含 6 个字符。对于该字符串的重复项,我希望 ID 与原始 ID 相同,但多了两个字符。我的逻辑有问题。这是我到目前为止所做的:

from itertools import groupby
import uuid
f = open('test.txt', 'r')
addresses = f.readlines()

list_of_addresses = ['Address']
list_of_ids = ['ID']


for x in addresses:
list_of_addresses.append(x)


def find_duplicates():

for x, y in groupby(sorted(list_of_addresses)):
id = str(uuid.uuid4().get_hex().upper()[0:6])
j = len(list(y))
if j > 1:
print str(j) + " instances of " + x
list_of_ids.append(id)
print list_of_ids

find_duplicates()

我应该如何处理这个问题?

编辑:这是test.txt的内容:

123 Test
123 Test
123 Test
321 Test
567 Test
567 Test

输出:

3 occurences of 123 Test

['ID', 'C10DD8']
['ID', 'C10DD8']
2 occurences of 567 Test

['ID', 'C10DD8', '595C5E']
['ID', 'C10DD8', '595C5E']

最佳答案

If the strings are duplicates, I want the first instance of the string to have an ID containing 6 characters. For the duplicates of that string, I want the ID to be the same as the original one, but with an additional two characters.

尝试使用 collections.defaultdict .

给定

import ctypes
import collections as ct


filename = "test.txt"


def read_file(fname):
"""Read lines from a file."""
with open(fname, "r") as f:
for line in f:
yield line.strip()

代码

dd = ct.defaultdict(list)
for x in read_file(filename):
key = str(ctypes.c_size_t(hash(x)).value) # make positive hashes
if key[:6] not in dd:
dd[key[:6]].append(x)
else:
dd[key[:8]].append(x)

dd

输出

defaultdict(list,
{'133259': ['123 Test'],
'13325942': ['123 Test', '123 Test'],
'210763': ['567 Test'],
'21076377': ['567 Test'],
'240895': ['321 Test']})

生成的字典对于唯一行的每个第一次出现都有键(长度为 6)。对于每个连续的复制行,都会为键分割两个附加字符。

您可以按照自己的意愿实现这些键。在本例中,我们使用 hash() 将键与每个唯一行相关联。然后我们从 key 中切出所需的序列。另请参阅有关制作的帖子 positive hash values from ctypes

<小时/>

要检查结果,请从 defaultdict 创建适当的查找字典。

# Lookups 
occurrences = ct.defaultdict(int)
ids = ct.defaultdict(list)

for k, v in dd.items():
key = v[0]
occurrences[key] += len(v)
ids[key].append(k)

# View data
for k, v in occurrences.items():
print("{} instances of {}".format(v, k))
print("IDs:", ids[k])
print()

输出

1 instances of 321 Test
IDs: ['240895']

2 instances of 567 Test
IDs: ['21076377', '210763']

3 instances of 123 Test
IDs: ['13325942', '133259']

关于python - 为具有重复项的字符串列表生成唯一 ID,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48630146/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com