gpt4 book ai didi

java - 如何解决scala代码中超出GC过载限制的问题

转载 作者:行者123 更新时间:2023-12-01 10:25:28 26 4
gpt4 key购买 nike

我只是从 solr 节点读取记录。我的代码只读取给定日期范围内的记录。我检查过,它适用于 50K 记录,但我尝试了 100k,然后发现超出了 GC 过载限制。

我的代码在scala中是这样的:

def querySolr(core: String, selectQuery: String, server: SolrClient,
pageNum: Int, pageStart: Int, pageSize: Int): (Long, SolrDocumentList) = {
val query = new SolrQuery(core)
query.setQuery(selectQuery)
query.setStart(pageStart)
query.setRows(pageSize)
val response: QueryResponse = server.query(query)
val results: SolrDocumentList = response.getResults
val total = results.getNumFound
(total, results)
}

def pageCalc(page: Int, pageSize: Int, totalItems: Long): (Int, Long, Long) = {
val from = ((page - 1) * pageSize) + 1
val to = totalItems min (from + pageSize - 1)
val totalPages = (totalItems / pageSize) + (if (totalItems % pageSize > 0) 1 else 0)
(from, to, totalPages)
}

def getRecordsFromSolr(core: String, solrhost: String, userName: String, password: String,
query: String): List[SolrDocument] = {

val startTime = System.nanoTime()
val url = "https://" + solrhost + ":8983/solr/" + core
val solrPort = 8983

val builder: SSLContextBuilder = new SSLContextBuilder()
builder.loadTrustMaterial(null, new TrustSelfSignedStrategy())
val sslsf: SSLConnectionSocketFactory = new SSLConnectionSocketFactory(
builder.build(), SSLConnectionSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER
)

val credsProvider: CredentialsProvider = new BasicCredentialsProvider()
credsProvider.setCredentials(
new AuthScope(solrhost, solrPort),
new UsernamePasswordCredentials(userName, password))

val httpclient: CloseableHttpClient =HttpClients.custom().setSSLSocketFactory(sslsf).setDefaultCredentialsProvider(credsProvider).build()

val server: SolrClient = new HttpSolrClient(url, httpclient)

logger.info("solr connection completed")

val pageSize = 1000
var pageNum = 1
var nextPage: (Int, Long, Long) = (0, 1000, 0)
var offset: Long = 0

var totalResult = querySolr(core, query, server, pageNum, 0, pageSize)
var total = totalResult._1
var results: List[SolrDocument] = totalResult._2.toList
while (total > offset) {
offset += pageSize
pageNum += 1
nextPage = pageCalc(pageNum, pageSize, total)
totalResult = querySolr(core, query, server, pageNum, nextPage._1, pageSize)
total = totalResult._1
results = (results ++ totalResult._2.toList)
}
}

java.lang.OutOfMemoryError:超出 GC 开销限制

如何避免内存泄漏。我尝试过每个核心 8GB。并且表包含数百万条记录。

我发现 60K 记录出现以下错误:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 0:0 was 18311053 bytes, which exceeds max allowed: spark.akka.frameSize (10485760 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.

最佳答案

读取太大的 solr 响应时,通常会出现 OutOfMemoryError 错误。

因此解决方案是最小化 solr 响应:

  1. 限制行大小
  2. 限制返回的字段列表(参数fl)。特别是包含大型索引文档(例如 pdf)的字段可能会增长到很大。

如果这没有帮助,我建议分析您的 solr 响应。尝试找出实际的 solr 查询并在浏览器中执行它。

关于java - 如何解决scala代码中超出GC过载限制的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35377485/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com