gpt4 book ai didi

Python Pandas 数据框分配

转载 作者:太空狗 更新时间:2023-10-30 00:48:04 24 4
gpt4 key购买 nike

我正在关注 Lynda 教程,他们使用以下代码:

import pandas as pd
import seaborn
flights = seaborn.load_dataset('flights')
flights_indexed = flights.set_index(['year','month'])
flights_unstacked = flights_indexed.unstack()
flights_unstacked['passengers','total'] = flights_unstacked.sum(axis=1)

而且效果很好。但是,在我的例子中,代码似乎没有编译,因为最后一行我一直收到错误。

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

我在视频中知道他们使用的是 Python 2,但我使用的是 Python 3,因为我是为了工作而学习(使用 Python 3)。大多数差异我已经能够弄清楚,但是我无法弄清楚如何使用乘客总和创建这个名为 'total' 的新列。

最佳答案

此错误消息的根本原因是 month 列的分类性质:

In [42]: flights.dtypes
Out[42]:
year int64
month category
passengers int64
dtype: object

In [43]: flights.month.cat.categories
Out[43]: Index(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], d
type='object')

并且您正在尝试添加类别 total - Pandas 不喜欢这样。

解决方法:

In [45]: flights.month.cat.add_categories('total', inplace=True)

In [46]: x = flights.pivot(index='year', columns='month', values='passengers')

In [47]: x['total'] = x.sum(1)

In [48]: x
Out[48]:
month January February March April May June July August September October November December total
year
1949 112.0 118.0 132.0 129.0 121.0 135.0 148.0 148.0 136.0 119.0 104.0 118.0 1520.0
1950 115.0 126.0 141.0 135.0 125.0 149.0 170.0 170.0 158.0 133.0 114.0 140.0 1676.0
1951 145.0 150.0 178.0 163.0 172.0 178.0 199.0 199.0 184.0 162.0 146.0 166.0 2042.0
1952 171.0 180.0 193.0 181.0 183.0 218.0 230.0 242.0 209.0 191.0 172.0 194.0 2364.0
1953 196.0 196.0 236.0 235.0 229.0 243.0 264.0 272.0 237.0 211.0 180.0 201.0 2700.0
1954 204.0 188.0 235.0 227.0 234.0 264.0 302.0 293.0 259.0 229.0 203.0 229.0 2867.0
1955 242.0 233.0 267.0 269.0 270.0 315.0 364.0 347.0 312.0 274.0 237.0 278.0 3408.0
1956 284.0 277.0 317.0 313.0 318.0 374.0 413.0 405.0 355.0 306.0 271.0 306.0 3939.0
1957 315.0 301.0 356.0 348.0 355.0 422.0 465.0 467.0 404.0 347.0 305.0 336.0 4421.0
1958 340.0 318.0 362.0 348.0 363.0 435.0 491.0 505.0 404.0 359.0 310.0 337.0 4572.0
1959 360.0 342.0 406.0 396.0 420.0 472.0 548.0 559.0 463.0 407.0 362.0 405.0 5140.0
1960 417.0 391.0 419.0 461.0 472.0 535.0 622.0 606.0 508.0 461.0 390.0 432.0 5714.0

更新:或者,如果您不想触及原始 DF,您可以删除 flights_unstacked DF 中的分类列:

In [76]: flights_unstacked.columns = \
...: flights_unstacked.columns \
...: .set_levels(flights_unstacked.columns.get_level_values(1).categories,
...: level=1)
...:

In [77]: flights_unstacked['passengers','total'] = flights_unstacked.sum(axis=1)

In [78]: flights_unstacked
Out[78]:
passengers
month January February March April May June July August September October November December total
year
1949 112 118 132 129 121 135 148 148 136 119 104 118 1520
1950 115 126 141 135 125 149 170 170 158 133 114 140 1676
1951 145 150 178 163 172 178 199 199 184 162 146 166 2042
1952 171 180 193 181 183 218 230 242 209 191 172 194 2364
1953 196 196 236 235 229 243 264 272 237 211 180 201 2700
1954 204 188 235 227 234 264 302 293 259 229 203 229 2867
1955 242 233 267 269 270 315 364 347 312 274 237 278 3408
1956 284 277 317 313 318 374 413 405 355 306 271 306 3939
1957 315 301 356 348 355 422 465 467 404 347 305 336 4421
1958 340 318 362 348 363 435 491 505 404 359 310 337 4572
1959 360 342 406 396 420 472 548 559 463 407 362 405 5140
1960 417 391 419 461 472 535 622 606 508 461 390 432 5714

关于Python Pandas 数据框分配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42076272/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com