gpt4 book ai didi

python - 对数据框的行进行排序并更改每隔一行的值时出现问题

转载 作者:行者123 更新时间:2023-12-01 07:48:50 26 4
gpt4 key购买 nike

我一直在研究数据框,尝试首先按列的值对其进行排序。然后更改某些列的每隔一行的值。对我正在执行的列进行排序:

df['key'] = df['Direction'].apply(lambda x: x.split()[0])
# Take the second number to ensure the order is kept
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])

class_determiner_df = df.sort_values(['key', 'key2'])

这可以按预期对列进行排序,来 self 之前的问题 Sort the rows of a data frame .

然后我有以下数据框:

         Node               Feature Indicator  Scaled     Class    Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 gini = 0.000 class_5 0 -> 202
1 1 WPS <= 0.074 class_5 1 -> 2
64 64 gini = 0.000 class_4 1 -> 65
10 10 funktion <= 0.500 class_2 10 -> 11
17 17 gini = 0.000 class_5 10 -> 18
100 100 SPW <= 0.282 class_5 100 -> 101
101 101 gini = 0.000 class_5 100 -> 102
102 102 words_nb <= 0.322 class_3 102 -> 103
123 123 gini = 0.496 class_2 102 -> 124
103 103 words_nb <= 0.125 class_2 103 -> 104
104 104 gini = 0.000 class_2 103 -> 105
105 105 SPW <= 0.290 class_4 105 -> 106
106 106 gini = 0.000 class_4 105 -> 107
107 107 words_nb <= 0.197 class_3 107 -> 108
116 116 gini = 0.000 class_4 107 -> 117
108 108 SPW <= 0.330 class_3 108 -> 109
109 109 gini = 0.000 class_3 108 -> 110
11 11 auftragnehm <= 0.500 class_2 11 -> 12
16 16 gini = 0.000 class_2 11 -> 17
110 110 Comp_conj <= 0.125 class_3 110 -> 111
115 115 gini = 0.000 class_4 110 -> 116
111 111 words_nb <= 0.138 class_3 111 -> 112
112 112 gini = 0.000 class_3 111 -> 113
113 113 weird_words <= 0.167 class_3 113 -> 114
114 114 gini = 0.000 class_3 113 -> 115
117 117 polarity <= 0.175 class_2 117 -> 118
118 118 gini = 0.000 class_2 117 -> 119
119 119 Aux_Start_no <= 0.500 class_3 119 -> 120
120 120 gini = 0.000 class_3 119 -> 121
.. ... ... ... ... ... ...

然后,我尝试使 df['feature'] 和 df['value'] 的每隔一行等于上面的行,并使 df['indicator'] 等于 '>'

我使用以下内容来执行此操作,取自此答案:Adjust every other row of a data frame

 # Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)

这会产生以下不正确的数据帧:


Node Feature Indicator Scaled Class Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 gini = 0.000 class_5 0 -> 202
1 1 gini > 0.000 class_5 1 -> 2
64 64 gini = 0.000 class_4 1 -> 65
10 10 gini > 0.000 class_2 10 -> 11
17 17 gini = 0.000 class_5 10 -> 18
100 100 gini > 0.000 class_5 100 -> 101
101 101 gini = 0.000 class_5 100 -> 102
102 102 gini > 0.000 class_3 102 -> 103
123 123 gini = 0.496 class_2 102 -> 124
103 103 gini > 0.496 class_2 103 -> 104
104 104 gini = 0.000 class_2 103 -> 105
105 105 gini > 0.000 class_4 105 -> 106
106 106 gini = 0.000 class_4 105 -> 107
107 107 gini > 0.000 class_3 107 -> 108
116 116 gini = 0.000 class_4 107 -> 117
108 108 gini > 0.000 class_3 108 -> 109
109 109 gini = 0.000 class_3 108 -> 110
11 11 gini > 0.000 class_2 11 -> 12
16 16 gini = 0.000 class_2 11 -> 17
110 110 gini > 0.000 class_3 110 -> 111
115 115 gini = 0.000 class_4 110 -> 116
111 111 gini > 0.000 class_3 111 -> 112
112 112 gini = 0.000 class_3 111 -> 113
113 113 gini > 0.000 class_3 113 -> 114
114 114 gini = 0.000 class_3 113 -> 115
117 117 gini > 0.000 class_2 117 -> 118
118 118 gini = 0.000 class_2 117 -> 119
119 119 gini > 0.000 class_3 119 -> 120
120 120 gini = 0.000 class_3 119 -> 121
.. ... ... ... ... ... ...

第二行“gini”已替换其后的每一行,是否有更好的方法来确保数据框看起来像这样:

        Node               Feature Indicator  Scaled     Class    Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 km > 0.181 class_5 0 -> 202
1 1 WPS <= 0.074 class_5 1 -> 2
64 64 WPS > 0.074 class_4 1 -> 65
10 10 funktion <= 0.500 class_2 10 -> 11
17 17 function > 0.500 class_5 10 -> 18
100 100 SPW <= 0.282 class_5 100 -> 101
101 101 SPW > 0.282 class_5 100 -> 102
102 102 words_nb <= 0.322 class_3 102 -> 103
123 123 words_nb > 0.322 class_2 102 -> 124
105 105 SPW <= 0.290 class_4 105 -> 106
106 106 SPW > 0.290 class_4 105 -> 107
...

我不太确定为什么以下内容不起作用,因为它似乎是我需要的

    class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)

最佳答案

这是因为loc使用的是索引标签而不是位置。您可以使用DataFrame.reset_index轻松解决此问题:

class_determiner_df.reset_index(inplace=True, drop=True)

# Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)

关于python - 对数据框的行进行排序并更改每隔一行的值时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56326137/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com