gpt4 book ai didi

rgdal 有效读取大型多波段栅格

转载 作者:行者123 更新时间:2023-12-04 06:41:27 24 4
gpt4 key购买 nike

我正在使用 rgdal 包在 R 中处理图像分类脚本。有问题的栅格是具有 28 个 channel 的 PCIDSK 文件:一个训练数据 channel 、一个验证数据 channel 和 26 个光谱数据 channel 。目标是填充一个数据帧,其中包含训练数据 channel 中不为零的每个像素的值,以及 26 个波段中的相关光谱值。

在 Python/Numpy 中,我可以轻松地将整个图像的所有波段导入到多维数组中,但是,由于内存限制,R 中的唯一选项似乎是逐块导入此数据,这非常慢:

library(rgdal)

raster = "image.pix"
training_band = 2
validation_band = 1
BlockWidth = 500
BlockHeight = 500

# Get some metadata about the whole raster
myinfo = GDALinfo(raster)
ysize = myinfo[[1]]
xsize = myinfo[[2]]
numbands = myinfo[[3]]


# Iterate through the image in blocks and retrieve the training data
column = 0
training_data = NULL
while(column < xsize){
if(column + BlockWidth > xsize){
BlockWidth = xsize - column
}
row = 0
while(row < ysize){
if(row + BlockHeight > ysize){
BlockHeight = ysize - row
}

# Do stuff here
myblock = readGDAL(raster, region.dim = c(BlockHeight,BlockWidth), offset = c(row, column), band = c(training_band,3:numbands), silent = TRUE)
blockdata = matrix(NA, dim(myblock)[1], dim(myblock)[2])
for(i in 1:(dim(myblock)[2])){
bandname = paste("myblock", names(myblock)[i], sep="$")
blockdata[,i]= as.matrix(eval(parse(text=bandname)))
}
blockdata = as.data.frame(blockdata)
blockdata = subset(blockdata, blockdata[,1] > 0)
if (dim(blockdata)[1] > 0){
training_data = rbind(training_data, blockdata)
}

row = row + BlockHeight
}
column = column + BlockWidth
}
remove(blockdata, myblock, BlockHeight, BlockWidth, row, column)

有没有更快/更好的方法来做同样的事情而不会耗尽内存?

收集这些训练数据后的下一步是创建分类器(randomForest 包),这也需要大量内存,具体取决于请求的树的数量。这让我想到了我的第二个问题,即考虑到训练数据已经占用的内存量,创建 500 棵树的森林是不可能的:
myformula = formula(paste("as.factor(V1) ~ V3:V", dim(training_data)[2], sep=""))
r_tree = randomForest(formula = myformula, data = training_data, ntree = 500, keep.forest=TRUE)

有没有办法分配更多的内存?我错过了什么吗?谢谢...

[编辑]
正如 Jan 所建议的,使用“raster”包要快得多;然而,据我所知,就收集训练数据而言,它并不能解决内存问题,因为它最终需要在数据帧中,在内存中:
library(raster)
library(randomForest)

# Set some user variables
fn = "image.pix"
training_band = 2
validation_band = 1

# Get the training data
myraster = stack(fn)
training_class = subset(myraster, training_band)
training_class[training_class == 0] = NA
training_class = Which(training_class != 0, cells=TRUE)
training_data = extract(myraster, training_class)
training_data = as.data.frame(training_data)

所以虽然这更快(并且需要更少的代码),但它仍然没有解决没有足够的可用内存来创建分类器的问题......是否有一些我没有发现的“光栅”包函数可以完成这个?谢谢...

最佳答案

查看 Raster 包。 Raster 包为 Rgdal 提供了一个方便的包装器,而无需将其加载到内存中。

http://raster.r-forge.r-project.org/

希望这有帮助。

The 'raster' package deals with basic spatial raster (grid) data access and manipulation. It defines raster classes; can deal with very large files (stored on disk); and includes standard raster functions such as overlay, aggregation, and merge.

The purpose of the 'raster' package is to provide easy to use functions for raster manipulation and analysis. These include high level functions such as overlay, merge, aggregate, projection, resample, distance, polygon to raster conversion. All these functions work for very large raster datasets that cannot be loaded into memory. In addition, the package provides lower level functions such as row by row reading and writing (to many formats via rgdal) for building other functions.



通过使用 Raster 包,您可以避免在使用 randomForest 之前填满您的内存。

[编辑] 为了解决随机森林的内存问题,如果您可以在子样本(大小<

关于rgdal 有效读取大型多波段栅格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4186507/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com