machine-learning - 线性回归: Substituting the non-numerical discrete domain of a predictor with numerical one-6ren

machine-learning - 线性回归: Substituting the non-numerical discrete domain of a predictor with numerical one

转载作者：行者123 更新时间：2023-11-30 08:33:50

27

4

所以我有一个训练集，其中一个属性的域如下:

A = {Type1, Type2, Type3, ... ,Type5}

如果域保持这种形式，我就无法应用线性回归，因为数学假设不可能成立，例如:

H = TxA + T1xB + T2xC + ...

(也就是说，如果我们假设除了 A 属性之外的所有属性都是数字，那么您不能将实值参数与类型相乘)

我可以用数值、等效、离散值替换域，这样我就可以对这个问题进行线性回归并且没问题吗？

A = {1, 2, 3, ...., 5 )

这是最佳实践吗？如果没有，您能给我在这些情况下的替代方案吗？

最佳答案

最佳实践是进行单热(one-of-K)编码:对于 A 可以采用的每个值，定义一个单独的指标特征。因此，对于五个“类型”，A = type1 将是

[1, 0, 0, 0, 0]

并且A = type3是

[0, 0, 1, 0, 0]

然后将这些向量与您的其他特征连接起来，以便您的假设变为

H = w[Atype1] * [A=type1] + ... + w[Atype5] * [A=type5] + w[B] * B + ...

使用[]表示指标函数。

这避免了您的方法的主要问题，即您引入了许多(可能是不正确的)偏见，例如即type5 = type2 + type3。要进一步了解为什么这比您的编码更好，请参阅 this answer of mine .

关于machine-learning - 线性回归: Substituting the non-numerical discrete domain of a predictor with numerical one，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19512863/

27

4

0

文章推荐： java - 在 Java 中使用 Apache Fop 嵌入条形码

文章推荐： javascript - 合并两个数组的相同索引对象

c++ - gmock SetArgReferee : Set a non-copyable non-moveable object
我正在使用 gmock 并模拟了一个函数 boost::beast::http::response_parser作为输出参数。功能签名看起来像: error_code readFromSocket(b
c - 使用 for 循环打印特定的 "non-vowel, vowel, non-vowel"单词
我的任务是打印由“非元音、元音、非元音”组成的单词列表，即 bab、bac、bad、bad ... 到 zuz。我已经设法创建了一个代码，它执行前两个字母，但在最后一个循环中丢失并只打印'}' -
Matlab， "Assignment has more non-singleton rhs dimensions than non-singleton subscripts"
我正在尝试使用 label2rgb 生成 RGB 标签切片并使用它来更新 RGB 体积，如下所示: labelRGB_slice=label2rgb(handles.label(:,:,han
c# - 我得到 "Non abstract, non-.cctor-method in an interface"的原因是什么？
我有一个很奇怪的问题。我在 dll 中定义了一个接口(interface)，如下所示: public interface IKreator2 { string Name { get; set;
ssl - Openshift "https"+non-www 不工作但 "http"+ non-www 工作
在我的 openshift Drupal 托管中，网络都在 SSL 下 http://domain.com -> https://www.domain.com 确定 http://www.domain
c# - "Non-nullable event must contain a non-null value when exiting constructor"
我收到警告“退出构造函数时不可为空的事件‘SomeEvent’必须包含非空值。考虑将事件声明为可空。” 这是我的代码的一个非常简化的版本，它复制了完全相同的问题。我在这里错过了什么？这与 .Net 6
java - Mockito 无法模拟此类 : Mockito can only mock non-private & non-final classes
在一次大学考试中，我被要求测试一些 apache 簿记员类/方法，在这样做的过程中，我想在我的参数化测试中使用 mockito。没有 mockito 的测试工作正常但是当我尝试模拟接口(interfa
google-sheets - 谷歌表格 : How to get the last non-zero and non-empty value in a column?
假设 A 列在 7 行中有以下值: 2 [空白的] 0 -0.3 0 [空白的] 0 如何获取范围(7 行)中非空/空白且不为零的最后一个值？因此，在这种情况下，正确答案是 -0.3。最佳答案 =I
c++ - MSVC : a variable with non-static storage duration cannot be used as a non-type argument
考虑以下受 this talk 启发的代码: template struct even_common_type_helper_impl; template struct even_common_typ
c++ - 隐式转换 : const reference vs non-const reference vs non-reference
考虑这段代码， struct A {}; struct B { B(const A&) {} }; void f(B) { cout << "f()"<
java - 如何处理 Findbugs "Non-transient non-serializable instance field in serializable class"？
考虑下面的类(class)。如果我对它运行 Findbugs，它会在第 5 行但不在第 7 行给我一个错误(“可序列化类中的非 transient 非可序列化实例字段”)。 1 public clas
python - 子进程.CalledProcessError : returned non-zero exit status 1 for non-pingable destination
我正在编写一个 python 脚本来计算数据包丢失通过使用 ping IP 地址linux 中的 subprocess 模块。 CSV 文件中保存了多个 IP 地址。当只给出可 ping 目的地时
testflight - ITMS-90338 : Non-public API usage - The app references non-public selectors _setAlwaysRunsAtForegroundPriority:
我只是做文本更改，在文本之前它工作正常。请任何人都可以帮助我。提前致谢最佳答案我已经解决了: ionic cordova 插件rmcordova-plugin-ionic-webview ion
java - 我如何在 persistence.xml 中定义？
我如何定义在 persistence.xml 中？我的项目在 Tomcat 6 和 Tomcat 7 中运行良好。现在我正在使用 Struts 2 Spring 3.0.5 JPA 2 Jbos
maven - 安装具有 3rd-party non-mvn jar 依赖项的 3rd-party non-mvn jar
我有一个 maven 仓库中不存在的第三方 jar，我们称它为“a.jar”，它也依赖于至少 20 多个第三方 jar，其中大部分不在 maven 中或者，我们称它们为“b.jar、c.jar、d.j
linux - (Nginx) Non-HTTP/Non-WWW to HTTPS/WWW 导致 PHP (Without Extension) 文件被下载
我已经浏览了各种线程很多小时(不夸张)，但一直无法找到一种解决方案组合，使我能够将非 www 和 http 转发到 www 和 https，同时仍然能够查看 php 文件没有扩展名。如下是我的ngin
支持 Scott Meyer 建议的 C++ IDE : Prefer non-member non-friend functions over members
Scott Meyer 关于非成员函数增加封装并允许更优雅的设计(设计方面)的论点对我来说似乎非常有效。看这里:Article 但是我对此有疑问。 (似乎还有其他人，尤其是库开发人员，他们通常完全忽略
c++ - Effective C++ Item 23 Prefer non-member non-friend functions to member functions
在对类设计的一些事实感到困惑时，特别是函数是否应该是成员，我查看了 Effective c++ 并找到了第 23 条，即 Prefer non-member non-friend functions
javascript - 错误 : Registration token(s) provided to sendToDevice() must be a non-empty string or a non-empty array
我正在尝试使用 firebase 云功能将通知发送到一个点半径的圆内的设备。我能够获取圈内设备的 ID，但无法获取 token ，使用 console.log(token) 打印时 token 为空。
reactjs - react -ckeditor5 : CKEditorError: datacontroller-set-non-existent-root: Attempting to set data on a non-existing root
我在我的项目中使用 React-ckeditor 5 包。我得到一个反序列化的 html 数据，我正在使用 React-html-parser 包将它解析成 html 模板，并将这个解析的数据传递给

首页

博学

6Ren·AI

商城

machine-learning - 线性回归: Substituting the non-numerical discrete domain of a predictor with numerical one