gpt4 book ai didi

Balancing Data for Multiple-Instance Learning with Unbalanced Classes(具有不平衡类的多实例学习中的数据平衡)

转载 作者:bug小助手 更新时间:2023-10-22 17:35:02 27 4
gpt4 key购买 nike



Balancing Data for Multiple-Instance Learning with Unbalanced Classes


Problem Statement (Simplified):

问题陈述(简化):


I have a CSV file where each row is labeled as either class A or B. Class A has 906 instances, while class B has 255 instances. I want to use this Multiple-instance Learning (MIL) classifier https://github.com/garydoranjr/misvm for classification. But apparently the data is very imbalanced.

我有一个CSV文件,其中每行都标记为a类或B类。a类有906个实例,而B类有255个实例。我想使用这个多实例学习(MIL)分类器https://github.com/garydoranjr/misvm用于分类。但显然数据非常不平衡。


Additional Details:

其他详细信息:


I'm conducting an analysis on time-series patterns of specific activities, particularly brain activities. Each row in the CSV file represents a 5-second window for a single instance. The total duration of the experiment is 'n' seconds, resulting in approximately 'n/5' 5-second windows with a 1-second shift between them (ignore if unfamiliar with this concept). Therefore, the total number of rows in the CSV file is roughly calculated as:

我正在对特定活动的时间序列模式进行分析,尤其是大脑活动。CSV文件中的每一行代表单个实例的5秒窗口。实验的总持续时间为“n”秒,导致大约“n/5”个5秒的窗口之间有1秒的偏移(如果不熟悉这个概念,请忽略)。因此,CSV文件中的总行数大致计算为:


Total Rows = 906 * (n/5) + 255 * (n/5)


Question:

问题:


I'm considering duplicating rows of class B a certain number of times (e.g., 3 times) to balance the dataset. Is this a valid approach? Please also tell me if there are other approaches to tickle this kinda problem? Thanks in advance!

我正在考虑将类B的行复制一定次数(例如3次),以平衡数据集。这是一种有效的方法吗?还请告诉我是否有其他方法来解决这种问题?提前感谢!


更多回答
优秀答案推荐
更多回答

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com