gpt4 book ai didi

python - 我如何使用 Pandas 按字母顺序将数据分类?

转载 作者:太空宇宙 更新时间:2023-11-03 13:55:38 24 4
gpt4 key购买 nike

我有一个数据框,其中包含一个包含一系列字符串的列

books = pd.DataFrame([[1,'In Search of Lost Time'],[2,'Don Quixote'],[3,'Ulysses'],[4,'The Great Gatsby'],[5,'Moby Dick']], columns = ['Book ID', 'Title'])

Book ID Title
0 1 In Search of Lost Time
1 2 Don Quixote
2 3 Ulysses
3 4 The Great Gatsby
4 5 Moby Dick

和一个排序的边界列表

boundaries = ['AAAAAAA','The Great Gatsby', 'zzzzzzzz']

我想使用这些边界将数据框中的值分类到按字母顺序排列的容器中,类似于 pd.cut() 处理数字数据的方式。我的期望输出如下所示。

   Book ID                   Title                          binning
0 1 In Search of Lost Time ['AAAAAAA','The Great Gatsby')
1 2 Don Quixote ['AAAAAAA','The Great Gatsby')
2 3 Ulysses ['The Great Gatsby','zzzzzzzz')
3 4 The Great Gatsby ['The Great Gatsby','zzzzzzzz')
4 5 Moby Dick ['AAAAAAA','The Great Gatsby')

这可能吗?

最佳答案

搜索排序

boundaries = np.array(['The Great Gatsby'])
bins = np.array(['[A..The Great Gatsby)', '[The Great Gatsby..Z]'])

books.assign(binning=bins[boundaries.searchsorted(books.Title)])

Book ID Title binning
0 1 In Search of Lost Time [A..The Great Gatsby)
1 2 Don Quixote [A..The Great Gatsby)
2 3 Ulysses [The Great Gatsby..Z]
3 4 The Great Gatsby [A..The Great Gatsby)
4 5 Moby Dick [A..The Great Gatsby)

将其扩展到其他一些边界:

from string import ascii_uppercase as letters
boundaries = np.array([*string.ascii_uppercase[1:-1]])
bins = np.array([f'[{a}..{b})' for a, b in zip(letters, letters[1:])])

books.assign(binning=bins[boundaries.searchsorted(books.Title)])

Book ID Title binning
0 1 In Search of Lost Time [I..J)
1 2 Don Quixote [D..E)
2 3 Ulysses [U..V)
3 4 The Great Gatsby [T..U)
4 5 Moby Dick [M..N)

关于python - 我如何使用 Pandas 按字母顺序将数据分类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56173204/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com