gpt4 book ai didi

python - 分离动态结构化数据

转载 作者:行者123 更新时间:2023-11-30 09:16:22 26 4
gpt4 key购买 nike

我有大量数据(大约 20K 行),如下所示。

Caller1 5:30AM Mexico USA 2-22-19
Caller2 1:30AM Mexico USA 2-22-19
Caller3 2:30AM Mexico USA 2-22-19
Caller1 5:30AM Mexico USA 2-22-19
Caller5 3:30AM Mexico USA 2-22-19
Caller3 4:30AM Mexico USA 2-22-19
Caller2 5:30AM Mexico USA 2-22-19
Caller1 7:30AM Mexico USA 2-22-19
Caller12 9:39AM Mexico USA 2-22-19
Caller14 8:36AM Mexico USA 2-22-19
Caller15 2:39AM Mexico USA 2-22-19
Caller16 3:32AM Mexico USA 2-22-19

我正在寻找一种基于 CallerID 隔离数据的方法,如下所示:

Caller1 5:30AM Mexico USA 2-22-19
Caller1 5:30AM Mexico USA 2-22-19
Caller1 7:30AM Mexico USA 2-22-19
---------------------------------
Caller2 1:30AM Mexico USA 2-22-19
Caller2 5:30AM Mexico USA 2-22-1
---------------------------------
.
.

我最初习惯将此数据存储为字典,并将任何新数据添加到该字典中

我在隔离时遇到了麻烦,因为初始参数 CallerID 也是可变的。

我的代码:

>>> input = [('caller1', 'data....'),('caller2','data,,,,,)
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)

我无法使用它,因为数据集太大

Python 中是否有任何包可以根据句子的第一个单词来分离数据?

最佳答案

您可以尝试这种方法,将数据存储在列表的字典中,其中键作为要分组的字符串,即 Caller1、Caller2 等。

     data = ["Caller1 5:30AM Mexico USA 2-22-19",
"Caller2 1:30AM Mexico USA 2-22-19",
"Caller3 2:30AM Mexico USA 2-22-19",
"Caller1 5:30AM Mexico USA 2-22-19",
"Caller5 3:30AM Mexico USA 2-22-19",
"Caller3 4:30AM Mexico USA 2-22-19",
"Caller2 5:30AM Mexico USA 2-22-19",
"Caller1 7:30AM Mexico USA 2-22-19",
"Caller12 9:39AM Mexico USA 2-22-19",
"Caller14 8:36AM Mexico USA 2-22-19",
"Caller15 2:39AM Mexico USA 2-22-19",
"Caller16 3:32AM Mexico USA 2-22-19"]

grouped_data = {}

# ITERATE THE INPUT AND STORE DATA WITH KEY IN DICTIONARY OF LIST
for x in data:
temp: list = []
key = x.split(' ')[0]
if key in grouped_data:
temp = grouped_data.get(key)
temp.append(x)
grouped_data[key] = temp

# PRINT THE DATA AS GROUPED
for k, v in grouped_data.items():
print(f"data for {k}")
for d in v:
print(d)

关于python - 分离动态结构化数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55015757/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com