gpt4 book ai didi

Return row containing value(返回包含值的行)

转载 作者:bug小助手 更新时间:2023-10-25 09:46:22 32 4
gpt4 key购买 nike



I have a df where the last row is the median.

我有一个df,最后一行是中位数。


print(income.head(7))
geo_code 1 2 3 ... 114 115 116
1 228 801 - 2 457 600 NaN NaN NaN ... NaN NaN NaN
1228801 - 2457600 305.0 104.0 74.0 ... 6.0 251.0 15.0
153601 - 307200 2028.0 2330.0 2341.0 ... 153.0 2256.0 1149.0
153 801 - 307 600 NaN NaN NaN ... NaN NaN NaN
19201 - 38400 408.0 642.0 505.0 ... 2215.0 659.0 1006.0
19 601 - 38 200 NaN NaN NaN ... NaN NaN NaN
1 - 4800 28.0 38.0 31.0 ... 497.0 80.0 106.0

print(income.tail(3))
geo_code 1 2 3 ... 114 115 116
9601 - 19200 167.0 401.0 237.0 ... 1551.0 476.0 583.0
9601 - 19 600 NaN NaN NaN ... NaN NaN NaN
median 408.0 627.0 505.0 ... 497.0 659.0 494.0

I need the index (the row) of the median please. How do I return the row that matches the last value in a column?

So the median of column 1, which is 408, will return: 19201 - 38400.

请给我中位数的指数(排)。如何返回与列中最后一个值匹配的行?因此,第1列的中位数为408,将返回:19201-38400。


更多回答

I just noticed, the columns with spaces in the numbers seem to be duplicates. So for the sake of a minimal reproducible example, you could remove them. See also reproducible pandas examples.

我刚刚注意到,数字中有空格的栏似乎是重复的。因此,为了达到最小的可重现性,您可以删除它们。另见可繁殖大熊猫的例子。

优秀答案推荐

You can find the median row using .tail(1).squeeze() and then iterate through the columns, finding the row index where the median value is located. Then a dictionary median_rows stores the row indices of the median values for each column. Note that the .index[0] part in (income[column_name] == median_value) extracts the index (row number) of the first row where the condition is met, assuming that there is only one median value per column.

您可以使用.ail(1).Squeeze()找到中位数行,然后迭代列,找到中位数所在的行索引。然后,字典MIDENT_ROWS存储每一列的中值的行索引。请注意,(Income[Column_Name]==Medium_Value)中的.index[0]部分提取满足条件的第一行的索引(行号),假设每列只有一个中值。


# Calculate the median row for each column
median_row = income.tail(1).squeeze()

# Iterate through columns and find the row with the median value
median_rows = {}
for column_name, median_value in median_row.items():
median_rows[column_name] = income[income[column_name] == median_value].index[0]

# Print
for column_name, median_index in median_rows.items():
print(f"Median of column {column_name}: {median_row[column_name]} is in row {median_index}")


IIUC, you can use idxmax:

IIUC,您可以使用idxmax:


df.loc[df.iloc[:-1, 1:].eq(df.iloc[-1, 1:], axis=1).idxmax(), 'geo_code']

Output:

产出:


4            19201 - 38400
0 1 228 801 - 2 457 600
4 19201 - 38400
6 1 - 4800
4 19201 - 38400
0 1 228 801 - 2 457 600
Name: geo_code, dtype: object

If geo_code is the Index, you can simplify to:

如果Geo_code是索引,则可以简化为:


out = df.iloc[:-1].eq(df.iloc[-1], axis=1).idxmax()

And if you can have no-matches for some columns you further need to mask:

如果某些列没有匹配项,则需要进一步掩码:


m = df.iloc[:-1].eq(df.iloc[-1], axis=1)

m.idxmax().where(m.any())

1      19201 - 38400
2 NaN
3 19201 - 38400
114 1 - 4800
115 19201 - 38400
116 NaN
dtype: object

更多回答

income.tail(1).squeeze() can be simplified to income.iloc[-1].

可以将income.ail(1).挤压()简化为income.iloc[-1]。

In OP's df, geo_code is the index. That lets you simplify to df.iloc[:-1].eq(df.iloc[-1], axis=1).idxmax() or df.loc[df.index != 'median'].eq(df.loc['median'], ..., with the result being indexed by column instead of by row.

在op的df中,geo_code是索引。这使您可以简化为df.iloc[:-1].eq(df.iloc[-1],轴=1).idxmax()或df.loc[df.index!=‘Medium’].eq(df.loc[‘Medium’],...),结果按列而不是按行索引。

It's worth noting that the data in the question is incomplete, so the two results at row 0 are incorrect.

值得注意的是,问题中的数据不完整,因此第0行的两个结果是不正确的。

@wjandrea good point, I assumed this was a column but the formatting suggests it could be otherwise. I included other options and showed how to mask the values should there be no match

@wjandrea很好,我以为这是一个列,但格式显示它可能不是。我还包括了其他选项,并展示了如何在没有匹配的情况下屏蔽值

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com