scala - 数据集过滤器: eta expansion is not done automatically-6ren

scala - 数据集过滤器: eta expansion is not done automatically

转载作者：行者123 更新时间：2023-12-02 04:25:46

25

4

如果我有一个简单的 Scala Int 集合，并且定义了一个简单的方法 isPositive 来在值大于 0 时返回 true，那么我可以直接传递方法到集合的 filter 方法，如下面的示例

def isPositive(i: Int): Boolean = i > 0

val aList = List(-3, -2, -1, 1, 2, 3)
val newList = aList.filter(isPositive)

> newList: List[Int] = List(1, 2, 3)

据我了解，编译器能够通过eta扩展自动将方法转换为函数实例，然后将此函数作为参数传递。

但是，如果我对 Spark 数据集执行同样的操作:

val aDataset = aList.toDS
val newDataset = aDataset.filter(isPositive)

> error

它失败并出现众所周知的“方法缺少参数”错误。为了使其工作，我必须使用“_”显式地将方法转换为函数:

val newDataset = aDataset.filter(isPositive _)

> newDataset: org.apache.spark.sql.Dataset[Int] = [value: int]

虽然使用 map 它可以按预期工作:

val newDataset = aDataset.map(isPositive)

> newDataset: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

研究签名，我发现数据集过滤器的签名与列表过滤器非常相似:

// Dataset:
def filter(func: T => Boolean): Dataset[T]

// List (Defined in TraversableLike):
def filter(p: A => Boolean): Repr

那么，为什么编译器不为数据集的过滤操作进行 eta 扩展？

最佳答案

这是由于重载方法和 ETA 扩展的本质造成的。 Eta-expansion between methods and functions with overloaded methods in Scala解释了为什么失败。

其要点如下(强调我的):

when overloaded, applicability is undermined because there is no expected type (6.26.3, infamously). When not overloaded, 6.26.2 applies (eta expansion) because the type of the parameter determines the expected type. When overloaded, the arg is specifically typed with no expected type, hence 6.26.2 doesn't apply; therefore neither overloaded variant of d is deemed to be applicable.

......

Candidates for overloading resolution are pre-screened by "shape". The shape test encapsulates the intuition that eta-expansion is never used because args are typed without an expected type. This example shows that eta-expansion is not used even when it is "the only way for the expression to type check."

正如 @DanielDePaula 指出的，我们在 DataSet.map 中看不到这种效果的原因是重载方法实际上需要一个额外的 Encoder[U] 参数:

def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {
  MapElements[T, U](func, logicalPlan)
}

def map[U](func: MapFunction[T, U], encoder: Encoder[U]): Dataset[U] = {
  implicit val uEnc = encoder
  withTypedPlan(MapElements[T, U](func, logicalPlan))
}

关于scala - 数据集过滤器: eta expansion is not done automatically，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45591980/

25

4

0

文章推荐： c - 了解 C 内存分配和释放

文章推荐： r - 如何在 ggplot2 中为分组线定义线图顺序？

windows-services - "Automatic"与 "Automatic (Delayed start)"
安装 Windows 服务时，有两个选项可用于在 Windows 启动时自动启动 Windows 服务。一种是自动，另一种是自动(延迟启动)。这两者具体有什么区别？例如，如果您使用 wixtools
automatic-ref-counting - 如何启用/禁用 “Automatic Reference Counting”？
使用Xcode 4.2，如何启用/禁用“自动引用计数”？ ANSWERED 在“构 build 置”下，根据是否要启用ARC翻转"is"和“否”。最佳答案全局范围内: 转到“构 build 置”，
php - "the value of the cookie is automatically URLencoded when sending the cookie, and automatically decoded when received"是什么意思？
在学习 PHP 中 Cookie 的概念时，我从 w3schools PHP Tutorial 中看到以下语句: The value of the cookie is automatically UR
php - "the value of the cookie is automatically URLencoded when sending the cookie, and automatically decoded when received"是什么意思？
在学习 PHP 中 Cookie 的概念时，我从 w3schools PHP Tutorial 中看到以下语句: The value of the cookie is automatically UR
ios - 代码签名标识 : Automatic
我曾经有自动代码签名身份:iPhone Developer 在真实设备上进行测试(它与我的“开发”证书匹配)。很快在我的目标设置中，我无法选择“iPhone 开发者”，而且我没有看到任何开发者证书，
C++ : automatic const?
当我编译这段代码时: class DecoratedString { private: std::string m_String; public: // ... constructs
java - 为什么局部变量在Java中也叫 "Automatic"？
我在 Kathy Sierra 的书中读到过: “局部变量有时称为堆栈、临时、自动或方法变量，但无论您使用什么，这些变量的规则都是相同的调用他们。” 为什么局部变量叫automatic？最佳答案当
WMI 创建系统服务的实现代码(Automatic)
复制代码代码如下: Const OWN_PROCESS = &H10 Const ERR_CONTROL = &H2 Const INTERACTIVE = False
Excel图表: Ordering by values (automatically)
我有以下问题: 我需要在条形图中从最高到最低排序我的值: 我知道我可以使用数据透视表和数据透视图，但将来可能会有点复杂。最佳答案我建议通过使用帮助列来根据需要对数据进行排序来实现这一点。 C 列:
powershell - 是否可以创建我自己的 'automatic variable' ？
我本质上想创建一个每次都会执行的变量。举个最简单的例子: $myvar = `write-host foo`; 然后每次我引用 $myvar 时，它都会输出 foo: dir $myvar Direc
delphi - 如何 "automatically"从uses子句中删除未使用的单元？
有人知道有一个实用程序可以自动检测并删除 uses 子句中不需要的单元吗？最好是.. 可以针对一个单元和/或一个项目运行免费且可与 Delphi 2010 配合使用提前致谢。最佳答案尝试使用
automatic-ref-counting - ARC项目中用于NSTextView的IBOutlet
在大多数情况下，当您阅读here时，IBOutlet应该很弱。现在，您可以在development library中阅读，并非所有类都支持弱引用。 (例如NSTextView)。这意味着您必须使用a
plot - 有 "automatic"x
只是一个简单的问题(我想)但是，假设我有以下数据文件: # no x data, it's sampled for instance each second. 23 42 48 49 89 33 39
asp.net - 没有配置身份验证处理程序来处理方案 : Automatic
我在以前工作的应用程序上用 RC 更新了 ASP.NET 5 框架 beta-8 包。在我让它运行后，启动过程中出现下一个错误: InvalidOperationException: No authe
powershell - Powershell-将服务StatusType设置为 'Automatic'
我编写了一个Powershell脚本，该脚本应将服务设置为StatusType ='Automatic'。但是，当我运行脚本时，它实际上设置了StatusType ='Automatic(Delaye
WPF : Automatic controls sizing
我想知道 WPF 中是否有一种自动控制大小调整的功能。我的意思是，一种根据用户屏幕分辨率自动调整元素大小的方法，而无需在代码中定义它。谢谢。最佳答案首先，WPF 使用与设备无关的像素，这意味着
python - "Press"程序运行时输入"automatically"
我正在从 bat 文件或 Python 文件调用外部程序 (fxTsUtf8.exe)。我浏览了数百个 sos 文件。在某些情况下，exe 文件可能会由于读取 sos 文件中的错误而失败。要继续执行
Java 线程 : Automatic Termination
我想知道正在创建的这个线程(引用代码片段)是否会在完成其工作后在垃圾收集中自动终止。我正在创建一个基本的聊天程序，以学习如何使用套接字、创建客户端和创建服务器。我很快发现，如果我希望能够从客户端发送
从SVN到服务器: How to compile automatically?上的Tomcat的Java类文件
我目前正在修复一个 JSP 项目，它目前在 Tomcat 的 WEB-INF 文件夹中有一个看似随机的 .class 文件集合。作为简化这一点的一种方法，我计划从这些类中直接从 SVN 获取 .jav
iOS : how to launch app automatically
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。要求提供代码的问题必须表现出对所解决问题的最低限度理解。包括尝试过的解决方案、为什么它们不起作用，以及

首页

博学

6Ren·AI

商城

scala - 数据集过滤器: eta expansion is not done automatically