java - 为什么 distinct 通过 flatMap 工作，而不是通过 map 的 "sub-stream"工作？-6ren

java - 为什么 distinct 通过 flatMap 工作，而不是通过 map 的 "sub-stream"工作？

转载作者：搜寻专家更新时间：2023-11-01 02:06:00

32

4

我正在阅读文本行，并创建其独特单词的列表(在将它们小写之后)。我可以使它与 flatMap 一起工作，但不能使它与 map 的“子”流一起工作。 flatMap 看起来更简洁和“更好”，但为什么 distinct 在一个上下文中起作用而在另一个上下文中不起作用？

类(class)榜首:

import static java.util.stream.Collectors.toList;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class GetListOfAllWordsInLinesOfText {

   private static final String INPUT = "Line 1\n" +
                              "Line 2, which is a really long line\n" +
                              "A moderately long line 3\n" +
                              "Line 4\n";
   private static final Pattern WORD_SEPARATOR_PATTERN = Pattern.compile("\\W+");

   public static void main(String[] args) {

为什么这个 distinct 允许重复通过:

      final List<String> wordList = new ArrayList<>();
      Arrays.stream(INPUT.split("\n"))
            .forEach(line -> WORD_SEPARATOR_PATTERN.splitAsStream(line).
                        map(String::toLowerCase)
                        distinct().
                        forEach(wordList::add));

      System.out.println("Output via map:");
      wordList.stream().forEach(System.out::println);

      System.out.println("--------");

输出:

Output via map:
line
1
line
2
which
is
a
really
long
a
moderately
long
line
3
line
4

但这正确地消除了重复项？

      final List<String> wordList2 = Arrays.stream(INPUT.split("\n")).flatMap(
            WORD_SEPARATOR_PATTERN::splitAsStream).map(String::toLowerCase).
            distinct()
            .collect(toList());

      System.out.println("Output via flatMap:");
      wordList2.stream().forEach(System.out::println);
   }
}

输出:

line
1
2
which
is
a
really
long
moderately
3
4

这是完整的输出，包括下面的 peek。您可以看到 flatMap 版本正确过滤了重复项，但 map 版本没有:

map :

map before distinct -> line
map after distinct -> line
map before distinct -> 1
map after distinct -> 1
map before distinct -> line
map after distinct -> line
map before distinct -> 2
map after distinct -> 2
map before distinct -> which
map after distinct -> which
map before distinct -> is
map after distinct -> is
map before distinct -> a
map after distinct -> a
map before distinct -> really
map after distinct -> really
map before distinct -> long
map after distinct -> long
map before distinct -> line
map before distinct -> a
map after distinct -> a
map before distinct -> moderately
map after distinct -> moderately
map before distinct -> long
map after distinct -> long
map before distinct -> line
map after distinct -> line
map before distinct -> 3
map after distinct -> 3
map before distinct -> line
map after distinct -> line
map before distinct -> 4
map after distinct -> 4
Output via map:
line
1
line
2
which
is
a
really
long
a
moderately
long
line
3
line
4
--------

平面 map :

flatMap before distinct -> line
flatMap after distinct -> line
flatMap before distinct -> 1
flatMap after distinct -> 1
flatMap before distinct -> line
flatMap before distinct -> 2
flatMap after distinct -> 2
flatMap before distinct -> which
flatMap after distinct -> which
flatMap before distinct -> is
flatMap after distinct -> is
flatMap before distinct -> a
flatMap after distinct -> a
flatMap before distinct -> really
flatMap after distinct -> really
flatMap before distinct -> long
flatMap after distinct -> long
flatMap before distinct -> line
flatMap before distinct -> a
flatMap before distinct -> moderately
flatMap after distinct -> moderately
flatMap before distinct -> long
flatMap before distinct -> line
flatMap before distinct -> 3
flatMap after distinct -> 3
flatMap before distinct -> line
flatMap before distinct -> 4
flatMap after distinct -> 4
Output via flatMap:
line
1
2
which
is
a
really
long
moderately
3
4

完整代码:

import static java.util.stream.Collectors.toList;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class GetListOfAllWordsInLinesOfText {

   private static final String INPUT = "Line 1\n" +
                              "Line 2, which is a really long line\n" +
                              "A moderately long line 3\n" +
                              "Line 4\n";
   private static final Pattern WORD_SEPARATOR_PATTERN = Pattern.compile("\\W+");

   public static void main(String[] args) {

      final List<String> wordList = new ArrayList<>();
      Arrays.stream(INPUT.split("\n"))
            .forEach(line -> WORD_SEPARATOR_PATTERN.splitAsStream(line).map(String::toLowerCase)
                  .peek(word -> System.out.println("map before distinct -> " + word)).
                        distinct().
                        peek(word -> System.out.println("map after distinct -> " + word)).
                        forEach(wordList::add));

      System.out.println("Output via map:");
      wordList.stream().forEach(System.out::println);

      System.out.println("--------");

      final List<String> wordList2 = Arrays.stream(INPUT.split("\n")).flatMap(
            WORD_SEPARATOR_PATTERN::splitAsStream).map(String::toLowerCase).
                  peek(word -> System.out.println("flatMap before distinct -> " + word)).
            distinct()
                  .peek(word -> System.out.println("flatMap after distinct -> " + word))
            .collect(toList());

      System.out.println("Output via flatMap:");
      wordList2.stream().forEach(System.out::println);
   }
}

最佳答案

第一个代码片段使用 forEach 来处理每一行，并在 forEach 中使用 distinct - 因此消除了重复性，但仅在内部一条线，不是全局的。

查看第二行的输出，重复出现的'line'实际上被消除了，因为它在同一行上重复出现。

关于java - 为什么 distinct 通过 flatMap 工作，而不是通过 map 的 "sub-stream"工作？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33046844/

32

4

0

文章推荐： javascript - 有谁知道 gmail 使用 flash 做什么？

文章推荐： javascript - 客户端加密的有效用例是什么？

文章推荐： javascript - Highcharts - 导出模块

文章推荐： java - 使用 Java 8 流将 long[] 转换为 String

MySQL:如果第一个条目选择“Where Distinct”，则在查询中不“Distinct”
我有一个包含电子邮件、IP、州、城市、时间戳、ID 列的表我需要按州分组计算电子邮件和 IP 的不同位置所以当我运行 MYSQL 查询时， select State, City ,count(di
mysql - sql中select distinct id和select distinct *的区别
我试过 select distinct ID from DB.TABLE; 它返回所有记录中的唯一 ID。 select distinct * from DB.TABLE; 它将通过比较所有列
sql - 使用 DISTINCT 子句过滤数据但仍拉取其他非 DISTINCT 字段
我正在尝试在 Postgresql 中编写一个查询，该查询提取一组有序数据并按不同的字段对其进行过滤。我还需要从同一表行中提取其他几个字段，但需要将它们排除在不同的评估之外。示例: SELECT
sql - Postgres DISTINCT 与 DISTINCT ON 之间有什么区别？
我有一个使用以下语句创建的 Postgres 表。该表由另一个服务的数据转储填充。 CREATE TABLE data_table ( date date DEFAULT NULL,
mysql - 根据同一行中的另一个 DISTINCT 列获取 DISTINCT 列
我在一个名为 products 的表中有 4 列 id|p_name| p_img | 1 | Xs | xsmax.png | 2 | Xs | xr.png |
mysql - 在 DISTINCT 条件中选择 DISTINCT 列
当它的状态仅为"is"时，我想从“num”中选择不同的值，而不是立即包括“否”？表: +--------+-----+--------+ | id | num | status | +---
php - 如何同时使用 DISTINCT 行和非 DISTINCT 行
全部!今天我有一个棘手的问题要给你，我想使用 select DISTINCT 语句来选择一个需要不同的行，但也在同一个语句中(或者我尝试过的方式？)一个没有的行't/不能区分。我想要的结果是每个类名中
c# - IQueryable.Distinct() 与 List.Distinct()
我有一个正在使用 Distinct() 的 linq 查询。如果我只是调用 Distinct() 而没有转换为列表，那么它不会返回不同的列表 - 它仍然包含重复项。但是，如果我转换为 List 并然
linq - 我应该使用 .ToList().Distinct() 还是 .Distinct().ToList()？
说到性能，我应该使用 .ToList().Distinct() 还是 .Distinct().ToList() ？两种扩展方法是否生成相同的 SQL 查询？看起来第二种方法应该表现更好，但这是真的
sql - 如何在SQL Server 20008R2中重写IS DISTINCT FROM和IS NOT DISTINCT FROM？
如何在不支持 SQL Server 2008R2 的 SQL 实现中重写包含标准 IS DISTINCT FROM 和 IS NOT DISTINCT FROM 运算符的表达式？最佳答案 IS DI
mysql - 为什么 Distinct * 不起作用但 count(Distinct *) 起作用？
有一张 table (在 HIVE) 示例 - meanalytics.key2_master_ids 该表有 6 列(cmpgn_id、offr_id、exec_id、creatv_id、cmpl_
mysql-workbench - 如何将 DISTINCT 数据导出到 DISTINCT 文件
SELECT * FROM `amc_info` WHERE department =' ( SELECT DISTINCT department ) into outfile = 'Differe
elasticsearch - 在Elasticsearch中可以计算 “distinct sum”和 “distinct average”吗？
如何在Elasticsearch中计算“不同的平均值”？我有一些这样的非规范化数据: { "record_id" : "100", "cost" : 42 } { "record_id" : "200
sql-server - 在一列上选择 Distinct 并消除 Select Distinct 中的空值？
关注这个question我有... ID SKU PRODUCT ======================= 1 FOO-23 Orange 2 BAR
mysql - 为什么 DISTINCT 使这个查询比没有 DISTINCT 花费的时间长 10 倍？
我有这个 mysql 查询: SELECT DISTINCT post.postId,hash,previewUrl,lastRetrieved FROM post INNER JOIN (tag a
mysql - 我们可以对 group_concat(distinct somefield) 做一个 DISTINCT 吗？
http://sqlfiddle.com/#!2/37dd94/17 如果我执行 SELECT DISTINCT，我得到的结果与只执行 SELECT 的结果相同。在查询结果中，您将看到两个包含 Di
mysql - func.count(distinct(...)) 不会给出与 distinct().count() 相同的结果
我有一列包含空条目，例如此列中的可能值为 None, 1, 2, 3 当我使用 session.query(func.count(distinct(Entry.col))).scalar() 计算列中
php - 在 mysql 中选择 distinct 和 count distinct
这是否可能从表列中选择不同的行并计算单个查询中每个不同字段的重复行 $sql = "SELECT DISTINCT location and COUNT(DISTINCT location)
mysql - count(distinct col_name) 与计算 select distinct 查询的行数不同吗？
我在 MySQL 数据库中有一个包含 1100 万行的表。其中一列是个人身份证号码。人们在表中被多次列出，我想知道有多少个唯一的个人 ID 号码。然后创建一个包含这些唯一数字的表格。当我计算列中不同的
sql - 为什么 SELECT DISTINCT 返回的行数与 COUNT(DISTINCT) 不同？
我刚刚注意到我的 Informix SQL 列(在同一个表中)的某些上有些奇怪。当我执行此查询时 SELECT DISTINCT colName FROM myTable 例如，我得到 40 行。但

首页

博学

6Ren·AI

商城

java - 为什么 distinct 通过 flatMap 工作，而不是通过 map 的 "sub-stream"工作？