python - 根据另一个数据帧的 lower_bound 和 upper

python - 根据另一个数据帧的 lower_bound 和 upper_bound 删除每列的离群值

转载作者：行者123 更新时间：2023-12-04 03:26:51

25

4

python 3.8， Pandas 1.2.4

MRE:

a = pd.DataFrame({"mean":[3.3, 2.9, 3.2, 5, 3.7, 5.3,5.8, 5.7],
                  "lower_bound":[1, 1, 1, 2, 3, 3, 4, 5],
                  "upper_bound":[4, 4, 6, 7, 8, 9, 9, 9]})

data = pd.DataFrame({0:[3,2,4,3,0,5,5,3,1,2,3,4,5,6],
                     1:[1,3,2,4,5,5,0,6,3,4,2,1,2,3],
                     2:[3,4,2,5,5,4,2,4,3,2,1,2,3,5],
                     3:[1,1,2,3,4,3,9,7,6,7,6,7,7,7],
                     4:[3,2,2,2,1,2,3,4,6,4,6,8,9,0],
                     5:[2,4,5,3,4,6,7,5,3,4,7,8,9,7],
                     6:[3,4,6,6,5,5,7,6,5,7,4,7,8,8],
                     7:[3,4,5,6,6,6,8,7,5,7,5,6,7,5]})

对于 data 数据框中的每一列，如果它不在 [lower_bound, upper_bound] 的范围内，我想将其设为 NaN

我的期望:

    0   1   2   3   4   5   6   7
0   3   1   3   NaN ..
1   2   3   4   NaN ..
2   4   2   2   2   
3   3   4   5   3   ..
4   NaN NaN 5   4   
5   NaN NaN 4   3   ..          ..

提前致谢。

编辑:+@用数据框中的平均值替换 [lower_bound, upper_bound] 之外的值。

最佳答案

让我们尝试使用 where :

import pandas as pd
import numpy as np

a = pd.DataFrame({"mean": [3.3, 2.9, 3.2, 5, 3.7, 5.3, 5.8, 5.7],
                  "lower_bound": [1, 1, 1, 2, 3, 3, 4, 5],
                  "upper_bound": [4, 4, 6, 7, 8, 9, 9, 9]})

data = pd.DataFrame({0: [3, 2, 4, 3, 0, 5, 5, 3, 1, 2, 3, 4, 5, 6],
                     1: [1, 3, 2, 4, 5, 5, 0, 6, 3, 4, 2, 1, 2, 3],
                     2: [3, 4, 2, 5, 5, 4, 2, 4, 3, 2, 1, 2, 3, 5],
                     3: [1, 1, 2, 3, 4, 3, 9, 7, 6, 7, 6, 7, 7, 7],
                     4: [3, 2, 2, 2, 1, 2, 3, 4, 6, 4, 6, 8, 9, 0],
                     5: [2, 4, 5, 3, 4, 6, 7, 5, 3, 4, 7, 8, 9, 7],
                     6: [3, 4, 6, 6, 5, 5, 7, 6, 5, 7, 4, 7, 8, 8],
                     7: [3, 4, 5, 6, 6, 6, 8, 7, 5, 7, 5, 6, 7, 5]})

mask = (a['lower_bound'] <= data) & (data <= a['upper_bound'])
data = data.where(mask, np.nan)
print(data)

输出:

      0    1  2    3    4    5    6    7
0   3.0  1.0  3  NaN  3.0  NaN  NaN  NaN
1   2.0  3.0  4  NaN  NaN  4.0  4.0  NaN
2   4.0  2.0  2  2.0  NaN  5.0  6.0  5.0
3   3.0  4.0  5  3.0  NaN  3.0  6.0  6.0
4   NaN  NaN  5  4.0  NaN  4.0  5.0  6.0
5   NaN  NaN  4  3.0  NaN  6.0  5.0  6.0
6   NaN  NaN  2  NaN  3.0  7.0  7.0  8.0
7   3.0  NaN  4  7.0  4.0  5.0  6.0  7.0
8   1.0  3.0  3  6.0  6.0  3.0  5.0  5.0
9   2.0  4.0  2  7.0  4.0  4.0  7.0  7.0
10  3.0  2.0  1  6.0  6.0  7.0  4.0  5.0
11  4.0  1.0  2  7.0  8.0  8.0  7.0  6.0
12  NaN  2.0  3  7.0  NaN  9.0  8.0  7.0
13  NaN  3.0  5  7.0  NaN  7.0  8.0  5.0

编辑:替换为 mean 选项:

mask = (a['lower_bound'] <= data) & (data <= a['upper_bound'])
data = data.where(mask, a['mean'], axis=1)

输出:

      0    1  2  3    4    5    6    7
0   3.0  1.0  3  5  3.0  5.3  5.8  5.7
1   2.0  3.0  4  5  3.7  4.0  4.0  5.7
2   4.0  2.0  2  2  3.7  5.0  6.0  5.0
3   3.0  4.0  5  3  3.7  3.0  6.0  6.0
4   3.3  2.9  5  4  3.7  4.0  5.0  6.0
5   3.3  2.9  4  3  3.7  6.0  5.0  6.0
6   3.3  2.9  2  5  3.0  7.0  7.0  8.0
7   3.0  2.9  4  7  4.0  5.0  6.0  7.0
8   1.0  3.0  3  6  6.0  3.0  5.0  5.0
9   2.0  4.0  2  7  4.0  4.0  7.0  7.0
10  3.0  2.0  1  6  6.0  7.0  4.0  5.0
11  4.0  1.0  2  7  8.0  8.0  7.0  6.0
12  3.3  2.0  3  7  3.7  9.0  8.0  7.0
13  3.3  3.0  5  7  3.7  7.0  8.0  5.0

关于python - 根据另一个数据帧的 lower_bound 和 upper_bound 删除每列的离群值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67443527/

25

4

0

文章推荐： c++ - 调用模板函数和非模板函数时的优先级是什么？

文章推荐： performance - 如何 : the minimal server to serve zero length answers

文章推荐： paging - 如何在 spring-data-rest 中禁用 JpaRepository 的分页

文章推荐： haskell - 无法将预期类型 'MultTree b' 与 '[MultTree b]' 匹配

c++ - 与 std::lower_bound 相比，ranges::lower_bound 是否有不同的比较要求？
似乎在 C++20 中使用与 std::lower_bound() 正常工作的相同比较仿函数不适用于 std::ranges::lower_bound() 。以下代码无法使用 Visual Studi
c++ - std::lower_bound 和 std::set::lower_bound 之间的差异
C++ 草案对 std::lower_bound 说: § 25.4.3.1 lower_bound [lower.bound] template ForwardIterator lower_bo
std::lower_bound 和 std::set::lower_bound 之间的 C++ 区别？
最近，在处理 C++ 编程问题时，我遇到了一些有趣的事情。我的算法使用了一个非常大的集合，并且会在其上多次使用 std::lower_bound 。然而，在提交我的解决方案之后，与我在纸上所做的数学运
c++ - Lower_bound 抛出 "error C2914: ' std::lower_bound':无法推断模板参数，因为函数参数不明确”
在尝试自学 STL 时，我编写了以下类: class Person{ public: ... bool operator::iterator itr = lower_bound(v.b
C++ 映射 lower_bound()
我尝试在 C++ STL Map 中使用 lowerbound()。在我使用它之前，我通过如下程序测试它的功能: int main () { std::map mymap; std::map:
C++ lower_bound 比较函数问题
我在使用 STL lower_bound 函数时遇到一些问题。我是 C++ 的新手。我需要对 Biz 类对象的 vector 进行排序，所以我使用了这种排序: bool cmpID(const Biz
c++ - 使用 'lower_bound'
#include using namespace std; int main() { int t; cin >> t; for (int i = 0; i > n >> m; long in
c++ - 结构节点类数组中的 lower_bound
我正在尝试添加 Node (struct)放入一个数组(类)中并使用 lower_bound 对其进行排序.在调试我的代码时，我意识到它停止工作: auto itr = std::lower_boun
c++ - lower_bound() 返回最后一个元素
当我解决392.Is Subsequence的问题时.在 Leetcode 上。当我使用 lower_bound() 函数时，我无法理解我想找到最后一个元素和找不到然后返回最后一个元素之间的区别。
c++ - 在二维数组中查找列的 lower_bound()
我有一个二维数组，我想在其中找到特定列的下界。我如何使用 std::lower_bound 做到这一点？最佳答案简介这并不像人们想象的那么难，让我们首先浏览一下适用于范围的算法函数的摘要。每
c++ - 如何在成对集合上使用 lower_bound()？
我给了一个std::set>和一个整数 x ，我必须找到第一个元素大于或等于给定整数 x 的第一对的迭代器. 我了解到如果s是 set>和 {x, y}是一对然后我可以使用 s.lower_bound
c++ - lower_bound 执行二进制搜索
在这里，我使用 std::lower_bound() 创建了一个二进制搜索函数。如下图。如果我传递 std::pair，这会很好地工作，但是我只想对 pair 的第一个值执行二进制搜索。我认为在 lo
C++ 映射 lower_bound/upper_bound
我了解到 C++ 中 map 的底层数据结构是一个自平衡的二叉搜索树。由于在这些数据结构中，查找键的下限和上限有很多用处，您会认为 map lower_bound 和 upper_bound 函数将为
c++ - C++ lower_bound()搜索最接近目标值的元素
假设我有一个vector，其元素是int类型。如何优雅地使用std::lower_bound()查找最接近目标值的元素？我写了如下代码: #include #include #include
c++ - C++ 中的 lower_bound()
从互联网上阅读，我了解到 C++ 中的 lower_bound() 方法用于返回一个指向范围 [first, last) 中的第一个元素的迭代器，该元素的值不是小于值。这意味着该函数返回刚好大于该数字
C++ 映射 lower_bound/upper_bound
我了解到 C++ 中 map 的底层数据结构是一个自平衡的二叉搜索树。由于在这些数据结构中，查找键的下限和上限有很多用处，您会认为 map lower_bound 和 upper_bound 函数将为
c++ - std::lower_bound 中使用的比较运算符
我的编译器拒绝编译这个简单的代码: struct mystruct{ int x; bool operator test; auto it = std::lower_bound(tes
C++ 设置 lower_bound() 迭代器
我想在 C++ 中的 std::set 中找到严格小于给定元素的最大元素。一些问题建议找到 lower_bound 迭代器并将其递减，即 set st; // Add elements int x;
c++ - std::lower_bound 的比较函数
我有一个带有成员变量 __emails 的类 PersonsDB，它应该是指向类 Person 对象的指针的排序 vector (按 Person 电子邮件排序)。我的计划是将 lower_bound
c++ - 无法传递 lower_bound 第四个参数
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and th

首页

博学

6Ren·AI

商城

python - 根据另一个数据帧的 lower_bound 和 upper_bound 删除每列的离群值