gpt4 book ai didi

artificial-intelligence - Quinlan的C4.5算法中如何计算数值属性的阈值?

转载 作者:行者123 更新时间:2023-12-04 18:45:06 25 4
gpt4 key购买 nike

我试图找出 C4.5 算法如何确定数字属性的阈值。我已经研究过但无法理解,在大多数地方我都找到了以下信息:

The training samples are first sorted on the values of the attribute Y being considered. There are only a finite number of these values, so let us denote them in sorted order as {v1,v2, …,vm}. Any threshold value lying between vi and vi+1 will have the same effect of dividing the cases into those whose value of the attribute Y lies in {v1, v2, …, vi} and those whose value is in {vi+1, vi+2, …, vm}. There are thus only m-1 possible splits on Y, all of which should be examined systematically to obtain an optimal split.

It is usual to choose the midpoint of each interval: (vi +vi+1)/2 as the representative threshold. C4.5 chooses as the threshold a smaller value vi for every interval {vi, vi+1}, rather than the midpoint itself.



我正在研究 Play/Dont Play ( value table ) 的示例,不明白当状态为晴天时,您如何获得属性湿度的数字 75 ( tree generated ) 因为湿度值到晴天状态是 {70,85,90,95}。

有人知道吗?

最佳答案

正如您生成的树图像所暗示的那样,您可以按顺序考虑属性。您的 75 示例属于 Outlook = 阳光分支。如果您根据 Outlook = 晴天过滤数据,则会得到下表。

outlook temperature humidity    windy   play
sunny 69 70 FALSE yes
sunny 75 70 TRUE yes
sunny 85 85 FALSE no
sunny 80 90 TRUE no
sunny 72 95 FALSE no

如您所见,此条件下的湿度阈值为“< 75”。

j4.8 是 ID3 algorithm 的后继者.它使用信息增益和熵来决定最佳分割。根据维基百科
The attribute with the smallest entropy 
is used to split the set on this iteration.
The higher the entropy,
the higher the potential to improve the classification here.

关于artificial-intelligence - Quinlan的C4.5算法中如何计算数值属性的阈值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16097189/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com