gpt4 book ai didi

python - 如何在 Pandas 的列中间添加字符串

转载 作者:太空宇宙 更新时间:2023-11-04 04:32:08 25 4
gpt4 key购买 nike

我正在寻找一种涉及循环遍历列表并在所需索引处发布字符串的 .apply 或 lambda 函数的解决方案。我有一个包含许多条目的专栏:

df = pd.DataFrame(["1:77631829:-:1:77641672:-"], columns=["position"])

position
0 1:77631829:-:1:77641672:-

我愿意:

    position
0 chr1:77631829:-:chr1:77641672:-

所以在开头和第三个冒号之后插入“chr”:

我本来以为这样的事情会做,但插入还没有实现系列:

"chr" + df["position"].str.split(":").insert(3, "chr").str.join(":")

这样做了,但看起来效率很低:

"chr" + df["position"].str.split(":").str[:3].str.join(":") + "chr" + df["position"].str.split(":").str[3:].str.join(":")

最佳答案

我认为你可以使用 :3 值拆分,然后提取列表的头部和尾部 - 加入头部,将 ch 添加到尾部、前缀 ch 和最后追加到列表 L:

df = pd.DataFrame(["1:77631829:-:1:77641672:-","1:77631829:-:1:77641672:-"], 
columns=["position"])
print (df)
position
0 1:77631829:-:1:77641672:-
1 1:77631829:-:1:77641672:-

L = []
for x in df["position"]:
*i, j = x.split(':', 3)
L.append(("chr" + ':'.join(i) + "chr" + j))

df['new'] = L
print (df)
position new
0 1:77631829:-:1:77641672:- chr1:77631829:-chr1:77641672:-
1 1:77631829:-:1:77641672:- chr1:77631829:-chr1:77641672:-

带有评论的黑客解决方案:

'chr' + df['position'].str.replace('-:', '-:chr')

使用列表理解和 f 字符串更快:

df['new'] = [f"ch{x.replace('-:', '-:chr')}" for x in df['position']]

性能:

df = pd.DataFrame(["1:77631829:-:1:77641672:-","1:77631829:-:1:77641672:-"], 
columns=["position"])

#[20000 rows x 1 columns]
df = pd.concat([df] * 10000, ignore_index=True)

In [226]: %%timeit
...: L = []
...: for x in df["position"]:
...: *i, j = x.split(':', 3)
...: L.append(("chr" + ':'.join(i) + "chr" + j))
...:
...: df['new1'] = L
...:
18.9 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [227]: %%timeit
...: df['new2'] = "chr" + df["position"].str.split(":").str[:3].str.join(":") + "chr" + df["position"].str.split(":").str[3:].str.join(":")
...:
50.8 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [228]: %%timeit
...: df['new3'] = 'chr' + df['position'].str.replace('-:', '-:chr')
...:
21.5 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [229]: %%timeit
...: df['new4'] = [f"ch{x.replace('-:', '-:chr')}" for x in df['position']]
...:
8.59 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 如何在 Pandas 的列中间添加字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52477276/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com