caching - Spark会自动缓存一些结果吗？-6ren

caching - Spark会自动缓存一些结果吗？

转载作者：行者123 更新时间：2023-12-02 10:32:12

26

4

我运行一个操作两次，第二次运行所需的时间很少，因此我怀疑 Spark 会自动缓存一些结果。但我确实找到了任何来源。

我使用的是 Spark1.4。

doc = sc.textFile('...')
doc_wc = doc.flatMap(lambda x: re.split('\W', x))\
            .filter(lambda x: x != '') \
            .map(lambda word: (word, 1)) \
            .reduceByKey(lambda x,y: x+y) 
%%time
doc_wc.take(5) # first time
# CPU times: user 10.7 ms, sys: 425 µs, total: 11.1 ms
# Wall time: 4.39 s

%%time
doc_wc.take(5) # second time
# CPU times: user 6.13 ms, sys: 276 µs, total: 6.41 ms
# Wall time: 151 ms

最佳答案

来自the documentation :

Spark also automatically persists some intermediate data in shuffle operations (e.g. reduceByKey), even without users calling persist. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call persist on the resulting RDD if they plan to reuse it.

底层文件系统还将缓存对磁盘的访问。

关于caching - Spark会自动缓存一些结果吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31180592/

26

4

0

文章推荐： upgrade - Plone升级3.3.5至Plone 4.1.2

文章推荐： c++ - 使用pugixml将xml namespace 添加到xml_document

文章推荐： c++ - Python CTYPE:如何释放内存？获取无效的指针错误

文章推荐： tridion - UGC 用户返回为 Null

caching - 我什么时候应该使用 Cache-Control : no-cache?
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 1
c# - 如何修改CPU Cache L1、Cache L2、Cache L3？
关闭。这个问题需要details or clarity .它目前不接受答案。想改进这个问题吗？通过 editing this post 添加细节并澄清问题. 关闭 8 年前。 Improve t
python - 为什么使用 apt.Cache 而不是 apt.cache.Cache() 创建对象
我卡在了一个点上，我无法进步，很抱歉这个愚蠢的问题。我为此进行了很多搜索，但我不知道我错过了什么。请帮助我。我研究了 python 中的模块和类。现在我想使用 python 和 apt 进行一些操作
caching - X-Cache-Status 始终使用 Kong proxy-cache 插件绕过
我在 Kong 有服务，我已经为该服务设置了代理缓存插件。 curl -X POST http://localhost:8001/plugins --data "name=proxy-cache"--
caching - ASP.NET 核心 WebAPI : Memory Caching vs Response Caching
ASP.NET Core 提供内存缓存和响应缓存。假设该应用程序是 ASP.NET Core WebAPI，它通过配置的响应缓存中间件将 SQL 数据库中的数据传送给用户。在什么情况下也使用内存缓
caching - 面试题 : Factorials and caching
我最近遇到了以下面试问题: You need to design a system to provide answers to factorials for between 1 and 100. Yo
jQuery 对象 : to cache or not to cache?
我的 Javascript (JS) 代码遇到了一些麻烦，因为我有时需要在同一个函数中多次访问相同的 DOM 元素。还提供了一些推理here . 从性能的角度来看，是一次性创建一个 jQuery 对象
caching - InterSystems Cache，在哪里可以找到全局定义
仅使用 Cache 终端，我使用或查看什么实用程序函数或 Global 来查找存在于 Cache 数据库中的所有 Globals 的列表？再次仅在缓存终端中使用，我使用或查看什么实用程序功能或全局以
jQuery 对象 : to cache or not to cache?
我的 Javascript (JS) 代码遇到了一些麻烦，因为有时我需要在同一个函数中多次访问同一个 DOM 元素。还提供了一些推理here . 从性能的角度来看，是先创建一个jQuery对象然后缓存
caching - Cache-Control 的无缓存和必须重新验证之间的区别？
来自 RFC 2616 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1 no-cache If the no-cach
caching - Cache put item并发和吞吐量
大多数 CDN 服务器对经常访问的内容使用缓存。场景:假设有人上传了一张非常热门的图片，并且来自同一位置的许多用户 (1000) 试图访问该图片。问题:假设网络服务器收到一个请求，首先检查它的缓存
jQuery 对象 : to cache or not to cache?
我的 Javascript (JS) 代码遇到了一些麻烦，因为有时我需要在同一个函数中多次访问同一个 DOM 元素。还提供了一些推理here . 从性能的角度来看，是先创建一个jQuery对象然后缓存
caching - 如果 Cache-Control 有 `no-cache` 和 `max-age=900` 会发生什么？
如果我将服务器响应设置为:Cache-Control: private,no-cache,max-age=900 ? 如果标题是这样的，会发生什么:Cache-Control: public,no-c
.net - dotnet System.Web.Caching.Cache 与 System.Runtime.Caching.MemoryCache
我有一个类需要在缓存中存储数据。最初我在 ASP.NET 应用程序中使用它，所以我使用了 System.Web.Caching.Cache。现在我需要在 Windows 服务中使用它。现在，据我了解
caching - Drupal 7 & Varnish 4 - 我总是得到 X-Drupal-Cache : MISS but X-Cache: HIT
我遇到了和这个人一样的问题:X-Drupal-Cache for Drupal 7 website always hits MISS ，并且找不到出路。我正在运行 Drupal 7 - 新闻流和
php - 在 Laravel 中 artisan config :cache actually cache the config as specified in the cache. php 设置？
我已将 Laravel 设置为使用 Redis 作为缓存。当我使用 Cache::('my_var', 'my_val'); 然后通过 CLI 检查 Redis 以查看 key 是否已创建时，我可以验
Windows azure 缓存错误 - "Cache referred to does not exist. Contact administrator or use the Cache administration tool to create a Cache."
我在 Windows Azure 云上有一个应用程序，并且正在使用 Windows Azure 共置缓存。有时，当我发布网站/web服务时，调用DataCacheFactory.GetCache方法
caching - Apollo 服务器端缓存 : What is cache keyed on?
我正在阅读 documentation for Apollo server-side caching ，但看不到任何关于缓存通常如何加密的内容。我需要的是一个以响应中包含的对象 ID 为键的缓存，而
hibernate - Grails\hibernate : To cache or not to cache?
Hibernate\Grails 中最好的缓存策略是什么？是否缓存所有实体和查询以及如何找到最佳解决方案？这是我的 hibernate 配置。 hibernate { cache.use_sec
caching - 'Nuget.Proxy Cache' 的类型初始化程序引发异常
我收到错误 'Nuget.Proxy Cache' 的类型初始化器抛出异常尝试连接到 Nuget 官方包源时。我在公司网络后面，但是我怀疑问题是连接性。有任何想法吗？最佳答案我有同样的问题。我

首页

博学

6Ren·AI

商城

caching - Spark会自动缓存一些结果吗？