I have a df where the last row is the median.
我有一个df,最后一行是中位数。
print(income.head(7))
geo_code 1 2 3 ... 114 115 116
1 228 801 - 2 457 600 NaN NaN NaN ... NaN NaN NaN
1228801 - 2457600 305.0 104.0 74.0 ... 6.0 251.0 15.0
153601 - 307200 2028.0 2330.0 2341.0 ... 153.0 2256.0 1149.0
153 801 - 307 600 NaN NaN NaN ... NaN NaN NaN
19201 - 38400 408.0 642.0 505.0 ... 2215.0 659.0 1006.0
19 601 - 38 200 NaN NaN NaN ... NaN NaN NaN
1 - 4800 28.0 38.0 31.0 ... 497.0 80.0 106.0
print(income.tail(3))
geo_code 1 2 3 ... 114 115 116
9601 - 19200 167.0 401.0 237.0 ... 1551.0 476.0 583.0
9601 - 19 600 NaN NaN NaN ... NaN NaN NaN
median 408.0 627.0 505.0 ... 497.0 659.0 494.0
I need the index (the row) of the median please. How do I return the row that matches the last value in a column?
So the median
of column 1
, which is 408
, will return: 19201 - 38400
.
请给我中位数的指数(排)。如何返回与列中最后一个值匹配的行?因此,第1列的中位数为408,将返回:19201-38400。
更多回答
You can find the median row using .tail(1).squeeze()
and then iterate through the columns, finding the row index where the median value is located. Then a dictionary median_rows
stores the row indices of the median values for each column. Note that the .index[0]
part in (income[column_name] == median_value
) extracts the index (row number) of the first row where the condition is met, assuming that there is only one median value per column.
您可以使用.ail(1).Squeeze()找到中位数行,然后迭代列,找到中位数所在的行索引。然后,字典MIDENT_ROWS存储每一列的中值的行索引。请注意,(Income[Column_Name]==Medium_Value)中的.index[0]部分提取满足条件的第一行的索引(行号),假设每列只有一个中值。
# Calculate the median row for each column
median_row = income.tail(1).squeeze()
# Iterate through columns and find the row with the median value
median_rows = {}
for column_name, median_value in median_row.items():
median_rows[column_name] = income[income[column_name] == median_value].index[0]
# Print
for column_name, median_index in median_rows.items():
print(f"Median of column {column_name}: {median_row[column_name]} is in row {median_index}")
IIUC, you can use idxmax
:
IIUC,您可以使用idxmax:
df.loc[df.iloc[:-1, 1:].eq(df.iloc[-1, 1:], axis=1).idxmax(), 'geo_code']
Output:
产出:
4 19201 - 38400
0 1 228 801 - 2 457 600
4 19201 - 38400
6 1 - 4800
4 19201 - 38400
0 1 228 801 - 2 457 600
Name: geo_code, dtype: object
If geo_code
is the Index, you can simplify to:
如果Geo_code是索引,则可以简化为:
out = df.iloc[:-1].eq(df.iloc[-1], axis=1).idxmax()
And if you can have no-matches for some columns you further need to mask:
如果某些列没有匹配项,则需要进一步掩码:
m = df.iloc[:-1].eq(df.iloc[-1], axis=1)
m.idxmax().where(m.any())
1 19201 - 38400
2 NaN
3 19201 - 38400
114 1 - 4800
115 19201 - 38400
116 NaN
dtype: object
更多回答
income.tail(1).squeeze()
can be simplified to income.iloc[-1]
.
可以将income.ail(1).挤压()简化为income.iloc[-1]。
In OP's df, geo_code
is the index. That lets you simplify to df.iloc[:-1].eq(df.iloc[-1], axis=1).idxmax()
or df.loc[df.index != 'median'].eq(df.loc['median'], ...
, with the result being indexed by column instead of by row.
在op的df中,geo_code是索引。这使您可以简化为df.iloc[:-1].eq(df.iloc[-1],轴=1).idxmax()或df.loc[df.index!=‘Medium’].eq(df.loc[‘Medium’],...),结果按列而不是按行索引。
It's worth noting that the data in the question is incomplete, so the two results at row 0 are incorrect.
值得注意的是,问题中的数据不完整,因此第0行的两个结果是不正确的。
@wjandrea good point, I assumed this was a column but the formatting suggests it could be otherwise. I included other options and showed how to mask the values should there be no match
@wjandrea很好,我以为这是一个列,但格式显示它可能不是。我还包括了其他选项,并展示了如何在没有匹配的情况下屏蔽值
我是一名优秀的程序员,十分优秀!