gpt4 book ai didi

Python。从 Pandas 列中提取字符串的最后一位

转载 作者:行者123 更新时间:2023-11-28 22:15:29 24 4
gpt4 key购买 nike

我想在一个新变量中存储“UserId”的最后一位数字(此类 UserId 是字符串类型)。

我想到了这个,但它是一个很长的 df 并且需要很长时间。关于如何优化/避免 for 循环的任何提示?

df['LastDigit'] = np.nan
for i in range(0,len(df['UserId'])):
df.loc[i]['LastDigit'] = df.loc[i]['UserId'].strip()[-1]

最佳答案

通过 str[-1] 索引使用 str.strip:

df['LastDigit'] = df['UserId'].str.strip().str[-1]

如果性能很重要并且没有缺失值,请使用列表理解:

df['LastDigit'] = [x.strip()[-1] for x in df['UserId']]

你的解决方案真的很慢,这是 this 的最后一个解决方案:

6) updating an empty frame (e.g. using loc one-row-at-a-time)

性能:

np.random.seed(456)
users = ['joe','jan ','ben','rick ','clare','mary','tom']
df = pd.DataFrame({
'UserId': np.random.choice(users, size=1000),

})

In [139]: %%timeit
...: df['LastDigit'] = np.nan
...: for i in range(0,len(df['UserId'])):
...: df.loc[i]['LastDigit'] = df.loc[i]['UserId'].strip()[-1]
...:
__main__:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
57.9 s ± 1.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [140]: %timeit df['LastDigit'] = df['UserId'].str.strip().str[-1]
1.38 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [141]: %timeit df['LastDigit'] = [x.strip()[-1] for x in df['UserId']]
343 µs ± 8.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

关于Python。从 Pandas 列中提取字符串的最后一位,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52850192/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com