gpt4 book ai didi

java - 为什么 distinct 通过 flatMap 工作,而不是通过 map 的 "sub-stream"工作?

转载 作者:搜寻专家 更新时间:2023-11-01 02:06:00 32 4
gpt4 key购买 nike

我正在阅读文本行,并创建其独特单词的列表(在将它们小写之后)。我可以使它与 flatMap 一起工作,但不能使它与 map 的“子”流一起工作。 flatMap 看起来更简洁和“更好”,但为什么 distinct 在一个上下文中起作用而在另一个上下文中不起作用?

类(class)榜首:

import static java.util.stream.Collectors.toList;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class GetListOfAllWordsInLinesOfText {

private static final String INPUT = "Line 1\n" +
"Line 2, which is a really long line\n" +
"A moderately long line 3\n" +
"Line 4\n";
private static final Pattern WORD_SEPARATOR_PATTERN = Pattern.compile("\\W+");

public static void main(String[] args) {

为什么这个 distinct 允许重复通过:

      final List<String> wordList = new ArrayList<>();
Arrays.stream(INPUT.split("\n"))
.forEach(line -> WORD_SEPARATOR_PATTERN.splitAsStream(line).
map(String::toLowerCase)
distinct().
forEach(wordList::add));

System.out.println("Output via map:");
wordList.stream().forEach(System.out::println);

System.out.println("--------");

输出:

Output via map:
line
1
line
2
which
is
a
really
long
a
moderately
long
line
3
line
4

但这正确地消除了重复项?

      final List<String> wordList2 = Arrays.stream(INPUT.split("\n")).flatMap(
WORD_SEPARATOR_PATTERN::splitAsStream).map(String::toLowerCase).
distinct()
.collect(toList());

System.out.println("Output via flatMap:");
wordList2.stream().forEach(System.out::println);
}
}

输出:

line
1
2
which
is
a
really
long
moderately
3
4

这是完整的输出,包括下面的 peek。您可以看到 flatMap 版本正确过滤了重复项,但 map 版本没有:

map :

map before distinct -> line
map after distinct -> line
map before distinct -> 1
map after distinct -> 1
map before distinct -> line
map after distinct -> line
map before distinct -> 2
map after distinct -> 2
map before distinct -> which
map after distinct -> which
map before distinct -> is
map after distinct -> is
map before distinct -> a
map after distinct -> a
map before distinct -> really
map after distinct -> really
map before distinct -> long
map after distinct -> long
map before distinct -> line
map before distinct -> a
map after distinct -> a
map before distinct -> moderately
map after distinct -> moderately
map before distinct -> long
map after distinct -> long
map before distinct -> line
map after distinct -> line
map before distinct -> 3
map after distinct -> 3
map before distinct -> line
map after distinct -> line
map before distinct -> 4
map after distinct -> 4
Output via map:
line
1
line
2
which
is
a
really
long
a
moderately
long
line
3
line
4
--------

平面 map :

flatMap before distinct -> line
flatMap after distinct -> line
flatMap before distinct -> 1
flatMap after distinct -> 1
flatMap before distinct -> line
flatMap before distinct -> 2
flatMap after distinct -> 2
flatMap before distinct -> which
flatMap after distinct -> which
flatMap before distinct -> is
flatMap after distinct -> is
flatMap before distinct -> a
flatMap after distinct -> a
flatMap before distinct -> really
flatMap after distinct -> really
flatMap before distinct -> long
flatMap after distinct -> long
flatMap before distinct -> line
flatMap before distinct -> a
flatMap before distinct -> moderately
flatMap after distinct -> moderately
flatMap before distinct -> long
flatMap before distinct -> line
flatMap before distinct -> 3
flatMap after distinct -> 3
flatMap before distinct -> line
flatMap before distinct -> 4
flatMap after distinct -> 4
Output via flatMap:
line
1
2
which
is
a
really
long
moderately
3
4

完整代码:

import static java.util.stream.Collectors.toList;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class GetListOfAllWordsInLinesOfText {

private static final String INPUT = "Line 1\n" +
"Line 2, which is a really long line\n" +
"A moderately long line 3\n" +
"Line 4\n";
private static final Pattern WORD_SEPARATOR_PATTERN = Pattern.compile("\\W+");

public static void main(String[] args) {

final List<String> wordList = new ArrayList<>();
Arrays.stream(INPUT.split("\n"))
.forEach(line -> WORD_SEPARATOR_PATTERN.splitAsStream(line).map(String::toLowerCase)
.peek(word -> System.out.println("map before distinct -> " + word)).
distinct().
peek(word -> System.out.println("map after distinct -> " + word)).
forEach(wordList::add));

System.out.println("Output via map:");
wordList.stream().forEach(System.out::println);

System.out.println("--------");

final List<String> wordList2 = Arrays.stream(INPUT.split("\n")).flatMap(
WORD_SEPARATOR_PATTERN::splitAsStream).map(String::toLowerCase).
peek(word -> System.out.println("flatMap before distinct -> " + word)).
distinct()
.peek(word -> System.out.println("flatMap after distinct -> " + word))
.collect(toList());

System.out.println("Output via flatMap:");
wordList2.stream().forEach(System.out::println);
}
}

最佳答案

第一个代码片段使用 forEach 来处理每一行,并在 forEach 中使用 distinct - 因此消除了重复性,但仅在内部一条线,不是全局的。

查看第二行的输出,重复出现的'line'实际上被消除了,因为它在同一行上重复出现。

关于java - 为什么 distinct 通过 flatMap 工作,而不是通过 map 的 "sub-stream"工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33046844/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com