java - Fisher 精确检验的算法或数学是什么？-6ren

java - Fisher 精确检验的算法或数学是什么？

转载作者：行者123 更新时间：2023-11-28 20:33:38

24

4

我需要对矩阵 n x m 进行 Fisher 精确检验。我已经搜索了几个小时，但只找到了一个示例代码，但它是用 Fortran 语言编写的。我一直在使用 Wolfram 工作，我快要完成了，但我错过了最后一点。

    /**
     * Performs Fisher's Exact Test on a matrix m x n
     * @param matrix Any matrix m x n.
     * @return The Fisher's Exact value of the matrix
     * @throws IllegalArgumentException If the rows are not of equal length
     * @author Ryan Amos
     */
    public static double getFisherExact(int[][] matrix){
        System.out.println("Working with matrix: ");
        printMatrix(matrix);
        for (int[] array : matrix) {
            if(array.length != matrix[0].length)
                throw new IllegalArgumentException();
        }
        boolean chiSq = matrix.length != 2 || matrix[0].length != 2;
        int[] rows = new int[matrix.length];
        int[] columns = new int[matrix[0].length];
        int n;
        //compute R and C values
        for (int i = 0; i < matrix.length; i++) {
            for (int j = 0; j < matrix[i].length; j++) {
                rows[i] += matrix[i][j];
                columns[j] += matrix[i][j];
            }
            System.out.println("rows[" + i + "] = " + rows[i]);
        }

        for (int i = 0; i < columns.length; i++) {
            System.out.println("columns[" + i + "] = " + columns[i]);
        }

        //compute n
        n = 0;
        for (int i = 0; i < columns.length; i++) {
            n += columns[i];
        }

        int[][][] perms = findAllPermutations(rows, columns);
        double sum = 0;
        //int count = 0;
        double cutoff = chiSq ? getChiSquaredValue(matrix, rows, columns, n) : getConditionalProbability(matrix, rows, columns, n);
        System.out.println("P cutoff = " + cutoff + "\n");
        for (int[][] is : perms) {
            System.out.println("Matrix: ");
            printMatrix(is);
            double val = chiSq ? getChiSquaredValue(is, rows, columns, n) : getConditionalProbability(is, rows, columns, n);
            System.out.print("Value: " + val); 
            if(val <= cutoff){
                //count++;
                System.out.print(" is below " + cutoff);
//              sum += (chiSq) ? getConditionalProbability(is, rows, columns, n) : val;
//              sum += val;
                double p = getConditionalProbability(is, rows, columns, n);
                System.out.print("\np = " + p + "\nsum = " + sum + " + p = ");
                sum += p;
                System.out.print(sum);
            } else {
                System.out.println(" is above " + cutoff + "\np = " + getConditionalProbability(is, rows, columns, n));
            }
            System.out.print("\n\n");
        }
        return sum;
        //return count / (double)perms.length;
    }

所有其他方法都已经过测试和调试。问题是我不确定从哪里找到所有可能的矩阵(具有相同行和列总和的所有矩阵)。我不确定如何使用这些矩阵并将它们转换为 p 值。我读了一些关于卡方的东西，所以我找到了一个卡方算法。

所以我的问题是:根据我所拥有的(矩阵的所有排列)，我如何计算 p 值？我所有的尝试要么在最后一个 for 循环中，要么在最后一个 for 循环中被注释掉。

这是完整的代码:http://pastie.org/private/f8lga9oj6f8vrxiw348q

最佳答案

编辑:

看看 wolfram，似乎 n x m 大小的问题可以用以下方法解决:

public static BigDecimal getHypergeometricDistribution(//
        int a[][], int scale, int roundingMode//
) throws OutOfMemoryError, NullPointerException {
    ArrayList<Integer> R = new ArrayList<Integer>();
    ArrayList<Integer> C = new ArrayList<Integer>();
    ArrayList<Integer> E = new ArrayList<Integer>();
    int n = 0;

    for (int i = 0; i < a.length; i++) {
        for (int j = 0; j < a[i].length; j++) {
            if (a[i][j] < 0)
                return null;

            n += a[i][j];
            add(C, j, a[i][j]);
            add(R, i, a[i][j]);
            E.add(a[i][j]);
        }
    }
    BigDecimal term1 = //
    new BigDecimal(multiplyFactorials(C).multiply(multiplyFactorials(R)));
    BigDecimal term2 = //
    new BigDecimal(getFactorial(n).multiply(multiplyFactorials(E)));

    return term1.divide(term2, scale, roundingMode);
}

对于 getBinomialCoefficient、getFactorial 和评论，查看我的 gist .

阶乘增长非常快，例如:

long 可以存储 20 个阶乘值。
double can store 170 first factorial values .

Wolfram 示例案例:

    int[][] a = { { 5, 0 }, { 1, 4 } };
    System.out.println(hdMM.getHypergeometricDistribution(a, 60, 6));

会导致:

0.023809523809523809523809523809523809523809523809523809523810

编辑 2:

我的方法很快，但内存效率不高，如果输入矩阵元素的总和超过 10000，这可能是个问题。原因是阶乘的内存。

Mathematica 中几乎等价的函数，没有这个问题:

FeT1::usage = "Fisher's exact Test, 1 tailed. For more information:
    http://mathworld.wolfram.com/FishersExactTest.html";
FeT1[a_List, nr_Integer: 6] := Module[{},
   SumRow[array_] := Total[Transpose[array]]; 
   SumTotal[array_] := Total[Total[array]]; 
   SumColumn[array_] := Total[array]; 
   TF[list_] := Times @@ (list!); 
   N[(TF[SumColumn[a]]*TF[SumRow[a]])/(SumTotal[a]!* TF[Flatten[a]]), nr]
 ];

和示例用法:

a = {{5, 0}, {1, 4}};
FeT1[a, 59]

会屈服于

0.023809523809523809523809523809523809523809523809523809523810

Mathematica 也有可用的统计包，其中实现了 Fisher 精确检验。恕我直言，用 Java 编写这个可以快 20%，但所需的工作量大约为 200%，开发时间为 400%。

关于java - Fisher 精确检验的算法或数学是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6379058/

24

4

0

文章推荐： python - 在 Python 中高效地删除列表中的列表

文章推荐： php - 使用 esc_js() 保存的输出数据

文章推荐： python - 如何计算 Pandas 列中逗号分隔的重复值？

c - Fisher 数检查结果不正确
已关闭。这个问题是 not reproducible or was caused by typos 。目前不接受答案。这个问题是由拼写错误或无法再重现的问题引起的。虽然类似的问题可能是 on-top
再现 Fisher 线性判别图
许多书籍都使用下图来说明 Fisher 线性判别分析的思想(该图来自 Pattern Recognition and Machine Learning，第 188 页) 我想知道如何用 R(或任何其他
java - Fisher 精确检验的算法或数学是什么？
我需要对矩阵 n x m 进行 Fisher 精确检验。我已经搜索了几个小时，但只找到了一个示例代码，但它是用 Fortran 语言编写的。我一直在使用 Wolfram 工作，我快要完成了，但我错过了
python - Fisher 信息计算扩展
给定数据矩阵 X.shape = (n, d) 和单个预测变量的估计值 y_pred.shape = (n,)，可以计算Fisher Information 我喜欢这样: V = diags
performance - Fisher 判别计算时间
我最近在 Lisp 中实现了 Fisher 的线性判别 (FLD)。到目前为止，我一直在使用 dim(<10) 的样本，其中 FLD 立即执行的数量为 10。今天早上，我使用 dim(5) 和 104
python - 如何矢量化 Fisher 精确检验？
是否可以使用 Fisher 精确检验的矢量化来优化此计算，如果可以，如何优化？当 num_cases 运行时很麻烦> ~1000000。 import numpy as np from scipy.s
r - A/B Fisher 检验显着性的样本大小
鉴于简单 A/B 测试的结果...... A B clicked 8 60 ignored 192 1940 (即 A 4% 和 B 3% 的 session 率) ... R
c# - Fisher-Yates 在单个字符串上随机播放还是使用等长排列？
现在我正在开发一套文字游戏作为自学的一种方式(并重新创建一些我最喜欢的文字游戏!)在一位“真正”学习编程的 friend 的帮助下，我们实现了一个很好的排列方法在我的一门课上。它正在查找 3 个及以上
Javascript fisher-yates 帮助
我是一般编码的新手，现在已经使用 javascript 有一段时间了 - 对于我在发布此问题时可能犯的任何失误，提前致歉。我研究了 2 个小时的大部分时间，无法自己得出答案，所以我在这里注册了一个帐户
algorithm - Fisher-Yates 随机抽样和水库抽样之间的区别
我知道F-Y和reservoir sampling都可以实现shuffle array。比如我们在一个m * n的扫雷板上部署k个炸弹。我已经完成了示例代码: public int[][] init
algorithm - SPOJ FISHER 的想法
我正在尝试在 SPOJ 上解决这个问题: http://www.spoj.pl/problems/FISHER/ 我想不出解决这个问题的办法。我在 topcoder
arrays - Fisher-Yates 洗牌算法错误
所以我目前正在使用 Actionscript 3.0 制作问答游戏，我想使用 Fisher-Yates 随机播放算法随机播放问题: 这是我的代码: var questions:Array = [1,2
c - Fisher Yates 算法在系统时间播种时在并行启动的程序中返回相同顺序的数字
我并行启动几个依赖于随机数的 C/C++ 程序。对这个话题还算陌生，听说过段时间应该做seed。此外，我使用 Fisher Yates 算法获得具有唯一随机打乱值的列表。但是，并行启动程序两次会为两
c# - 如何正确计算 Fisher 变换指标
我正在编写一个小型技术分析库，其中包含 TA-lib 中不可用的项目。我从在 cTrader 上找到的示例开始并将其与 TradingView 版本中的代码进行匹配。这是来自 TradingView
javascript - Fisher-Yates 洗牌可以产生所有纸牌排列吗？
我正在使用标准的 Fisher-Yates 算法随机洗牌数组中的一副牌。但是，我不确定这是否真的会产生真实世界洗牌后所有可能排列的真实分布。 V8 的 Math.random 只有 128 位的内部状
R For 循环执行 Fisher 测试 - 错误消息
我的数据框看起来像这样: 595.00000 18696 984.00200 32185 Group1 935.00000 18356 1589.000
r - 在 R-fisher 评分中实现递归函数的问题
我正在尝试在模拟 i.i.d 上实现 Fisher Scoring。 Poisson 数据，但出现堆栈溢出错误。我从函数中做了一些简单的打印，发现第一次迭代后猜测值没有改变。 fs_pois <- f
algorithm - Fisher-Yates Shuffle 向后执行的正确性
根据维基百科和Java标准库的实现，shuffling https://en.wikipedia.org/wiki/Fisher–Yates_shuffle (Fisher Yates Shuffli
对 R 中数据框的每一行运行 Fisher 测试
我有一个由~3k 调查人员进行的~50k 测量的数据框。 INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT1000 \\\ 38942
haskell - 我的 Fisher-Yates 洗牌有什么问题吗？
意识到当某些事情看起来好得令人难以置信时，我想我会提出这个问题，希望能清除任何小 Sprite 。我回顾了我能找到的几个相关主题，但我的问题仍然存在。我对 Haskell 比较陌生，在我的实验中，我

首页

博学

6Ren·AI

商城

java - Fisher 精确检验的算法或数学是什么？