gpt4 book ai didi

python - pandas:pandas.DataFrame.describe 只返回一列的信息

转载 作者:行者123 更新时间:2023-12-04 12:50:42 26 4
gpt4 key购买 nike

对于某个 Kaggle 数据集(规则禁止我在此处共享数据,但很容易访问 here),

import pandas
df_train = pandas.read_csv(
"01 - Data/act_train.csv.zip"
)
df_train.describe()

我得到:
>>> df_train.describe()
outcome
count 2.197291e+06
mean 4.439544e-01
std 4.968491e-01
min 0.000000e+00
25% 0.000000e+00
50% 0.000000e+00
75% 1.000000e+00
max 1.000000e+00

而对于相同的数据集 df_train.columns给我:
>>> df_train.columns
Index(['people_id', 'activity_id', 'date', 'activity_category', 'char_1',
'char_2', 'char_3', 'char_4', 'char_5', 'char_6', 'char_7', 'char_8',
'char_9', 'char_10', 'outcome'],
dtype='object')

df_train.dtypes给我:
>>> df_train.dtypes
people_id object
activity_id object
date object
activity_category object
char_1 object
char_2 object
char_3 object
char_4 object
char_5 object
char_6 object
char_7 object
char_8 object
char_9 object
char_10 object
outcome int64
dtype: object

我是否错过了为什么只有 Pandas 的原因 describe数据集中的一列?

最佳答案

默认情况下,describe仅适用于数字 dtype 列。添加关键字参数 include='all' . From the documentation :

If include is the string ‘all’, the output column-set will match the input one.



澄清一下, describe 的默认参数是 include=None, exclude=None .结果的行为是:

None to both (default). The result will include only numeric-typed columns or, if none are, only categorical columns.



另外,来自 备注 部分:

The output DataFrame index depends on the requested dtypes:

For numeric dtypes, it will include: count, mean, std, min, max, and lower, 50, and upper percentiles.

For object dtypes (e.g. timestamps or strings), the index will include the count, unique, most common, and frequency of the most common. Timestamps also include the first and last items.

关于python - pandas:pandas.DataFrame.describe 只返回一列的信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39201526/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com