gpt4 book ai didi

python - 使用形状因子级别将 pandas.DataFrame 转换为 numpy 张量

转载 作者:行者123 更新时间:2023-12-01 01:24:55 26 4
gpt4 key购买 nike

我有来自完整析因实验的数据。例如,对于 N 中的每个 sample ,我有 J测量类型和K测量位点。我收到长格式的数据,例如,

import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm

# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]

# fully crossed
exp_design = list(itertools.product(*levels))

df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])

# some fake data
df['x'] = rnorm(size=len(exp_design))

这会产生 24 个观测值 ( x ),并为三个因素各有一列。

> df.head()
sample mode gene x
0 1 start gene1 -1.229370
1 1 start gene2 1.129773
2 1 start gene3 -1.155202
3 1 stop gene1 -0.757551
4 1 stop gene2 -0.166129

我想将这些观察结果转换为相应的 (N,J,K)形张量(numpy 数组)。我正在考虑使用 MultiIndex 转向宽格式,然后提取值将生成正确的张量,但它只是作为列向量出现:

> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
[ 1.12977346],
[-1.15520216],
...,
[-0.1031641 ],
[ 1.1296491 ],
[ 1.31113584]])

有没有一种快速的方法可以从长格式获取张量格式的数据pandas.DataFrame

最佳答案

尝试使用

df.agg('nunique')

Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])

关于python - 使用形状因子级别将 pandas.DataFrame 转换为 numpy 张量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53455041/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com