Draw a curve defining the bounds of a scatter plot to detect anomalous (sparse or low density) points in MATLAB(在MatLab中绘制一条曲线，定义用于检测异常(稀疏或低密度)点的散点图的界限)-6ren

Draw a curve defining the bounds of a scatter plot to detect anomalous (sparse or low density) points in MATLAB(在MatLab中绘制一条曲线，定义用于检测异常(稀疏或低密度)点的散点图的界限)

转载作者：bug小助手更新时间：2023-10-28 13:04:32

28

4

I have thousands of x,y data paris of observed and theoretical temperatures in a given region, where I need to identify some anomalous (for the purpose of my work) data points. I have tried PCA analysys (i.e., DBSCAN), statistical thresholds, curve fittings, etc., but none of them did work. I then tried a KDE-based approach, which seems to 'better' work for my data. To achieve the results shown below I used https://it.mathworks.com/matlabcentral/fileexchange/8430-flow-cytometry-data-reader-and-visualization as:

我有数千个给定区域的观测和理论温度的x，y数据巴黎，在那里我需要识别一些异常的数据点(为了我的工作)。我尝试了PCA Analysys(即DBSCAN)、统计阈值、曲线拟合等方法，但都不起作用。然后我尝试了一种基于KDE的方法，这似乎更适合我的数据。为了实现如下所示的结果，我使用了https://it.mathworks.com/matlabcentral/fileexchange/8430-flow-cytometry-data-reader-and-visualization：

[hAxes,col,ctrs1,ctrs2,F] = dscatter(X,Y,'BINS',[250,250]);
% Please note, I modified the code to export hAxes,col,ctrs1,ctrs2 and plot the contour line as:
contour(ctrs1,ctrs2,F,0.0015,'k-'); % or 0.015 ---> See figure below

By using this apporach I was able to draw a contour line around the main cluster of my data, and I was mostly able to draw the line between 'main cluster' and 'anomalous' points. However, if I have too many 'anomalous' data points (see xy4 in the figure below), the method, using a fixed threshold, fails. Beside, I have to adjust the threshold based on the region of each x,y pair and I have no idea on how to find the correct threshold level (see the figure below - For the same region, a threshold level of 0.0015 seems to work for situations with a few anomalous points, but a threshold of 0.015 is needed when more spreaded points occurred).

通过使用这个比例，我能够围绕我的数据的主聚类绘制一条轮廓线，并且我基本上能够在“主聚类”和“异常”点之间画出一条线。然而，如果我有太多的“异常”数据点(见下图中的xy4)，使用固定阈值的方法就会失败。此外，我必须根据每个x，y对的区域调整阈值，我不知道如何找到正确的阈值水平(见下图-对于同一区域，阈值水平0.0015似乎适用于有几个异常点的情况，但当出现更多扩散点时，阈值需要0.015)。

What I would really like to do is to draw a curve around the main cluster of my scatterplot, so to devide the anomalous data points. I fully understand this may be a challenging task, but I hope you may provide some good alternatives and/or solutions.

我真正想做的是在散点图的主簇周围画一条曲线，这样就可以划分出异常的数据点。我完全理解这可能是一项具有挑战性的任务，但我希望您能提供一些好的替代方案和/或解决方案。

Another solution, as it seems to work, may be defining automatically the density threshold level, but I don't really know where to start from.

另一种似乎有效的解决方案可能是自动定义密度阈值水平，但我真的不知道从哪里开始。

Below, you can see 4 examples (xy 1 to 4 are attached
note col1 and col2 = xy1 - col3 and col4 = xy2, etc. You can find the data here: https://www.mashupstack.com/share/64fc9e4fb3683).

下面，您可以看到4个示例(XY 1到4是附注COL1和COL2=XY1-COL3和COL4=XY2，等等。您可以在这里找到数据：https://www.mashupstack.com/share/64fc9e4fb3683).

Example

示例

To the left is the simple x,y scatterplot. In the middle, the ideal curve (in red) defining the boundary of my scatterplot (manually sketched). To the right, the KDE-based approach discussed above. Please note the last figure where a threshold of 0.0015 fails, and a threshld of 0.015 is needed instead.

左边是简单的x，y散点图。在中间，理想曲线（红色）定义了我的散点图（手动绘制）的边界。右边是上面讨论过的基于KDE的方法。请注意最后一个图，其中阈值0.0015失败，而需要阈值0.015。

Any help is grately appreciated!

如有任何帮助，我们不胜感激！

Any other approach to identify the points outside the red boundaries is more than wellcome!

任何其他识别红色边界外的点的方法都比Wellcome更好！

更多回答

Related - note a keyword which might help your search is "convex hull"

相关-注意一个关键字，这可能有助于您的搜索是“凸包”

优秀答案推荐

更多回答

28

4

0

文章推荐： android - 带插件的 Android Studio 和 Intellij Idea 之间的区别？

文章推荐： mysql - 如何从数据库中解析时间

文章推荐：去模板名称

文章推荐： android - 进一步理解 setRetainInstance(true)

linux - 低 CPU、低 RAM、低 IO，但性能很差，为什么？
我的 Linux Centos Apache 服务器的性能有问题。我有一个程序(用 c 语言编写)可以同时执行许多 http 请求。这个过程本身看起来非常有效，就好像我可以同时向外部服务器发出 500
Python 设置并行端口数据引脚高/低
我想知道如何将并行端口上的数据引脚设置为高电平和低电平。我相信我可以使用 PyParallel 来实现此目的，但我不确定如何设置特定的引脚。谢谢! 最佳答案您在这里谈论的是软件-硬件接口(inte
python - 盘中数据的每日高/低
让我有一个像这样的日期时间索引的数据框: date_time open high low close vol 2018-05-13 18:00:00 70.
emacs - 低 Octave 卡住
在 emacs Octave 模式下，当我输入 M-x run-octave 时，命令会卡住，所以我使用 C-g 进行转义。我可以用 C-x b 切换到 *Inferior Octave* 缓冲区，但
python - 精度比 gridsearchCV 低
我正在 sklearn 中运行 gridsearchCV，尝试使用此代码找到最佳模型参数。 modelDNN= KerasRegressor(build_fn=build_DNN_model, epo
android - 低 android 存储会影响应用程序的性能吗？
美好的一天。我想知道 android 存储是否低，它会影响应用程序性能吗？因为同一个应用程序在另一台设备上运行速度很快，而同样的应用程序在另一台设备上非常滞后，后者有 12GB 内存中的 2GB 可用
mysql - 低 mysql 索引基数但数据多样
所以我在列卡上有一个带有索引的表当我运行时 SELECT COUNT(DISTINCT(card)) FROM table 它返回 490 个不同的条目但是当我运行的时候 SHOW INDEXES
python - 当最后一位为零 [低] 时将二进制转换为整数时出错
我正在使用手动方法将二进制转换为十进制。此代码在最后一位为高的情况下工作正常，例如:1001。当最后一位为零 [低] 时会出现错误。例如:1010 应该给出 10 但给出 5，因为没有考虑最后一位。有
sql - 低 MySQL 表缓存命中率
我一直在努力优化我的站点和数据库，并且我一直在使用 mysqltuner.pl 来帮助解决这个问题。除了表缓存命中率，无论我在 my.cnf 中将它提高多高，我几乎都得到了正确的结果，我仍然命中大约
iOS 模拟器游戏运行速度非常慢(低 fps)
深入研究 sprite kit (xcode 5)。我正在使用两个示例程序，1. 创建新项目时包含的默认宇宙飞船示例和 2. 我下载的 Adventure Game。在 iOS 模拟器中运行这些示例
C# 故意循环。 (低 CPU 使用率)
编辑: 感谢大家在这里提供答案，项目已完成。 https://github.com/0xyg3n/ProcessDaemon/ 如果有人想出可能会更好的多线程解决方案，我想。我是 C# 的新手，我想
delphi - 交换字变量的字节(低/高)的过程
我有一个交换 Word 变量的字节(低/高)的过程(它与 System.Swap 函数执行相同的操作)。该过程在编译器优化关闭时有效，但在编译器优化打开时无效。有人可以帮我解决这个问题吗？ proce
programming-languages - 低，中，高级语言有什么区别？
我以前听说过这些术语描述语言，例如 C 并不是一种低级语言，C++是中级语言，而Python是一种高级语言。我知道它必须与代码的编译方式以及代码的编写方式有关。但是我想知道的是，什么将语言定义为这三类
performance - NoSQL 数据库的开销和(低)效率？
我有一个关于 NoSQL 类型数据库的问题，特别是 MongoDB，但它通常适用于大多数键值或基于文档的存储。 NoSQL 的一些卖点是速度和可扩展性，但在我看来，与关系数据库相比，开销很大。你有很
java - LibGDX 上的 fps 低
如果没有此代码，fps 为 60-65。但是当我使用这段代码时，fps 下降到 50。另一个问题是某些设备上的 FPS 太低。然而，游戏非常简单。我对所有形状使用 ShapeRenderer。游戏在
java - Java 中的 FPS 低
您好，我的名字是 Ryan，我目前正在开发自己的 2D java 游戏。目前游戏世界中有很多物体。游戏重新开始时，世界会加载 100 棵随机放置的树木，这些树木是使用数组列表和树类制作的。我的游戏使用
低 CPU 利用率的 Java 最佳编码实践
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
postgresql - 低 Postgres 缓存命中率 - 数据大小或其他？
我刚刚将我的 Heroku postgres 数据库从 Kappa 计划(800MB RAM，postgres 9.1)升级到 Ronin 计划(1.7GB RAM，postgres 9.2)，但性能
ios nsdictionary 低平均高
现在我正在使用我的 NSDictionary 并运行所有值的循环以找到低值、高值和计算平均值。由于我是IOS 的新手，所以我想问问是否有更好的方法来做到这一点。有没有？谢谢。最佳答案这个问题的
c++ - arduino 低 i2c 读取速度；
我目前正在使用 genuino 101 进行一个项目，我需要通过 i2c 读取大量数据，以填充任意大小的缓冲区。从下图中我可以看到读取请求本身只需要大约 3毫秒，写请求大约 200 纳秒。但是在同一

首页

博学

6Ren·AI

商城

Draw a curve defining the bounds of a scatter plot to detect anomalous (sparse or low density) points in MATLAB(在MatLab中绘制一条曲线，定义用于检测异常(稀疏或低密度)点的散点图的界限)