gpt4 book ai didi

r - 我们如何才能找到对规则的先验支持和信心?

转载 作者:行者123 更新时间:2023-12-01 22:17:34 26 4
gpt4 key购买 nike

我正在交易数据中进行项目关联。我在 R 中使用 arules 包来构建规则。我正在通过此链接分享我的样本数据 https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0

library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6,
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"

我得到的输出是,

Summary(rules1)

set of 0 rules

我提到了 https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules发布此链接之前。我还尝试了随机数以获得支持和信心,但没有任何效果。

最佳答案

找到正确的最小支持度和最小置信度值并以 0 个频繁项集或 0 个关联规则结束的问题非常普遍。阅读this如果您需要复习支持和信心的确切含义。

我们先来看一下您的交易数据:

summary(trans)
transactions as itemMatrix in sparse format with
2531 rows (elements/itemsets/transactions) and
6632 columns (items) and a density of 0.0005951533

most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167 (Other)
84 35 31 29 24 9787

element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1

Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 2.000 3.947 5.000 48.000

要处理的第一个问题是最低支持。摘要表明您最常出现的项目 (AR845311) 在数据集中出现了 84 次。一般来说,您的项目的支持率很低

summary(itemFrequency(trans))

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900

你用了一分钟。 0.6的支持度,但是出现频率最高的单项只有0.033的支持度!你需要减少你的支持。如果您想查找在您的数据中出现至少 10 次的项集/规则,那么您可以将最小支持度设置为:

 10/length(trans)

[1] 0.003951008

第二个问题是您的数据非常稀疏(摘要显示密度约为 0.0006)。这意味着您的交易时间很短(即只包含很少的项目)。

table(size(trans))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1

短期交易意味着规则的可信度可能很低。对于你的数据,事实证明它非常低,所以我先使用 0。

rules <- apriori(trans, 
+ parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori

Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0 0.1 1 none FALSE TRUE 5 0.004 1 10
target ext
rules FALSE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 10

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> summary(rules)
set of 46 rules

rule length distribution (lhs + rhs):sizes
1 2
40 6

Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.00 1.00 1.13 1.00 2.00

summary of quality measures:
support confidence lift count
Min. :0.004346 Min. :0.004346 Min. : 1.000 Min. :11.00
1st Qu.:0.004741 1st Qu.:0.004840 1st Qu.: 1.000 1st Qu.:12.00
Median :0.005531 Median :0.005729 Median : 1.000 Median :14.00
Mean :0.006803 Mean :0.057301 Mean : 3.316 Mean :17.22
3rd Qu.:0.007112 3rd Qu.:0.008890 3rd Qu.: 1.000 3rd Qu.:18.00
Max. :0.033188 Max. :0.705882 Max. :21.269 Max. :84.00

mining info:
data ntransactions support confidence
trans 2531 0.004 0

结果表明,至少有一条置信度为 0.7 的规则。您可以更有信心地再次运行 APRIORI。以下是最高置信度规则:

inspect(head(rules, by = "confidence"))
lhs rhs support confidence lift count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824 21.26891 12
[2] {AR743988} => {AR845311} 0.004346108 0.6470588 19.49650 11
[3] {AR800369} => {AR845311} 0.007111814 0.5142857 15.49592 18
[4] {AR845311} => {AR800369} 0.007111814 0.2142857 15.49592 18
[5] {AR845311} => {AR835501} 0.004741209 0.1428571 21.26891 12
[6] {AR845311} => {AR743988} 0.004346108 0.1309524 19.49650 11

可以找到有关如何使用关联规则挖掘的完整示例 here .

希望这对您有所帮助!

关于r - 我们如何才能找到对规则的先验支持和信心?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43588163/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com