python - my_dataframe.new_column = 值？-6ren

python - my_dataframe.new_column = 值？

转载作者：行者123 更新时间：2023-11-28 22:55:47

我刚刚遇到了一种奇怪的 Pandas 行为。说我愿意:

import string
import random
m_size = (4,3)
num_mat = np.random.random_integers(0,10, m_size)
my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])]
mydf =  pd.DataFrame(num_mat, columns=['A', 'B', 'C'])

print mydf

   A   B   C
0  6   6   7
1  9  10   4
2  0  10   7
3  1   3  10

如果我现在这样做:

mydf.D = 4

我希望它创建一个列 D 并填充值 4，但是 mydf 的 entries 没有改变:

print mydf

   A   B   C
0  6   6   7
1  9  10   4
2  0  10   7
3  1   3  10

为什么？我没有收到任何警告或错误，那么mydf.D = 4做了什么？

这都是最新的稳定版 pandas (0.11.0)

最佳答案

尽管 pandas 允许您使用 df.Col 来读取列，但这显然只是 df['Col'] 的简写，并且速记不适用于创建新列。您需要执行 mydf['D'] = 4。

我觉得这很不幸，因为我经常尝试像你那样做。阴险的部分是它实际上在数据框对象上创建了一个名为 D 的普通 Python 属性；它实际上并没有作为列添加。因此，您必须确保删除该属性，否则即使您稍后正确添加它，它也会隐藏该列:

>>> d = pandas.DataFrame(np.random.randn(3, 2), columns=["A", "B"])
>>> d
          A         B
0 -0.931675  1.029137
1 -0.363033 -0.227672
2  0.058903 -0.362436
>>> d.Col = 8
>>> d.Col    # Attribute is there
8
>>> d['Col']    # But it is not a columns, just a simple attribute
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    d['Col']
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\frame.py", line 1906, in __getitem__
    return self._get_item_cache(key)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\generic.py", line 570, in _get_item_cache
    values = self._data.get(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1383, in get
    _, block = self._find_block(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1525, in _find_block
    self._check_have(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1532, in _check_have
    raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named Col'
>>> d['Col'] = 100    # Create a real column
>>> d.Col    # Attribute blocks access to column
8
>>> d['Col']    # Column is available via item access
0    100
1    100
2    100
Name: Col, dtype: int64
>>> del d.Col    # Delete the attribute
>>> d.Col     # Columns is now available as an attribute (!)
0    100
1    100
2    100
Name: Col, dtype: int64
>>> d['Col']    # And still as an item
5: 0    100
1    100
2    100
Name: Col, dtype: int64

看到 d.Col“只有在你删除它之后才有效”——也就是说，在你执行 del d.Col 之后，你可能会感到有些惊讶，随后阅读 d.Col 实际上会为您提供专栏。这只是因为 Python __getattr__ 的工作方式，但在这种情况下它仍然有点不直观。

关于python - my_dataframe.new_column = 值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16406343/

文章推荐： iphone - Storyboard中的 Xcode 4.2 错误

文章推荐： java - Gradle Tomcat 插件依赖

文章推荐： iphone - NSURLRequest 不工作

文章推荐： ruby-on-rails - 在 Rails 开发中使用 solr 和 sunspot

mysql - UPDATE new_column 而不是 SELECT new_column
我目前有这段代码: SELECT x.id, x.company, x.dt, x.price, x.ctcy, (x.price - y.price)/y.price AS 'change' FRO
python - my_dataframe.new_column = 值？
我刚刚遇到了一种奇怪的 Pandas 行为。说我愿意: import string import random m_size = (4,3) num_mat = np.random.random_in

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - my_dataframe.new_column = 值？