gpt4 book ai didi

Interpolation of missing values in 3D data-array in python(在PYTHON中实现三维数据数组中缺失值的内插)

转载 作者:bug小助手 更新时间:2023-10-28 13:30:22 26 4
gpt4 key购买 nike



I am struggeling with the following issue for quite some time and could not find a working solution yet.

我为以下问题挣扎了很长一段时间,还没有找到一个可行的解决方案。


I want to interpolate missing values (nan) in my data-array [in Python]. The array has the dimension lon, lat, time – it is a raster dataset over time.

我想在我的数据数组中插入缺失值(NaN)[在Python中]。该数组具有维度Lon、Late、Time--它是随时间变化栅格数据集。


Unfortunately there are some timesteps where all values are missing and where an interpolation with lon and lat (2D) is not possible. This is why I came up with the idea to interpolate over the time axis. I want the missing value of one timesteps gets interpolated with the value of the timestep before and after at the exact same pixel.

遗憾的是,在某些时间步长中,所有值都丢失,并且无法使用Lon和Lat(2D)进行插补。这就是为什么我想出了在时间轴上插补的想法。我希望在完全相同的像素上,用前后时间步长的值来对一个时间步长的缺失值进行内插。


Do you have any ideas how to do that?

你知道怎么做吗?


My current attempt was:

我目前的尝试是:


" def arr_interp(array):
arrN=np.array(array,copy=False)
arrN[np.isnan(arrN)]=interpolate.interp2d(my_array.lat, my_array.lon, my_array.time, fill_value="nan")

“def arr_interp(数组):arrN=np.array(数组,拷贝=FALSE)arrN[np.isnan(arrN)]=interpolate.interp2d(my_array.lat,my_array.lon,my_array.time,Fill_Value=”nan“)


arr_interp(my_array)"

ARR_INTERP(MY_ARRAY)“


更多回答
优秀答案推荐

The problem is that the NaN data may form blocks, where you cannot interpolate from the neighbors.

问题是,NaN数据可能会形成块,而您无法从邻居处进行内插。


A solution is to do Gauss Seidel interpolation solving the Laplace equation, (which creates data minimizing the curvature of the function).

一种解决方案是进行Gauss Seidel插值法求解拉普拉斯方程(这将创建最小化函数曲率的数据)。


This code finds the NaN values and does a 3D interpolation. I do not have access to your data, so it is done over synthetic data.

此代码找到NaN值并执行3D内插。我无法访问您的数据,因此它是在合成数据上完成的。


import numpy as np
import matplotlib.pyplot as plt


# create data
print("Creating data...")
size = 10 # 3D matrix of size: size³
# create x,y,z grid
x, y, z = np.meshgrid(np.arange(0, size), np.arange(
0, size), np.arange(0, size))


def f(x, y, z):
"""function to create synthetic data"""
return np.sin((x+y+z)/2)


data = np.zeros((size, size, size))
data[x, y, z] = f(x, y, z)

# create corrupted data
sizeCorruptedData = int(data.size*.2) # 20% of data is corrupted
# create random x,y,z index for NaN values
xc, yc, zc = np.random.randint(0, size, (3, sizeCorruptedData))

corruptedData = data.copy()
corruptedData[xc, yc, zc] = np.nan

# Interpolate on NaN values
print("Interpolating...")

# get index of nan in corrupted data
nanIndex = np.isnan(corruptedData).nonzero()

interpolatedData = data.copy()
# make an initial guess for the interpolated data using the mean of the non NaN values
interpolatedData[nanIndex] = np.nanmean(corruptedData)


def sign(x):
"""returns the sign of the neighbor to be averaged for boundary elements"""
if x == 0:
return [1, 1]
elif x == size-1:
return [-1, -1]
else:
return [-1, 1]

#calculate kernels for the averages on boundaries/non boundary elements
for i in range(len(nanIndex)):
nanIndex = *nanIndex, np.array([sign(x) for x in nanIndex[i]])

# gauss seidel iteration to interpolate Nan values with neighbors
# https://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method
for _ in range(100):
for x, y, z, dx, dy, dz in zip(*nanIndex):
interpolatedData[x, y, z] = (
(interpolatedData[x+dx[0], y, z] + interpolatedData[x+dx[1], y, z] +
interpolatedData[x, y+dy[0], z] + interpolatedData[x, y+dy[1], z] +
interpolatedData[x, y, z+dz[0]] + interpolatedData[x, y, z+dz[1]]) / 6)


# plot results
f, axarr = plt.subplots(2, 2)
axarr[0, 0].imshow(data[:, :, 1])
axarr[0, 0].title.set_text('Original Data')
axarr[0, 1].imshow(corruptedData[:, :, 1])
axarr[0, 1].title.set_text('Corrupted Data')
axarr[1, 0].imshow(interpolatedData[:, :, 1])
axarr[1, 0].title.set_text('Fixed Data')
axarr[1, 1].imshow(data[:, :, 1]-interpolatedData[:, :, 1])
axarr[1, 1].title.set_text('Error = Original-Fixed')
f.tight_layout()
plt.show()

enter image description here



I dont yet have enough reputation to comment however to add to @Colim's answer replacing the for loop with a vectorised version will increase performance significantly

然而,我还没有足够的声誉来评论@Colim的答案,用矢量化版本取代for循环将显着提高性能


for _ in tqdm(range(100)):
x, y, z, dx, dy, dz = nanIndex
interpolatedData[x, y, z] = (
(interpolatedData[x+dx[:,0], y, z] + interpolatedData[x+dx[:,1], y, z] +
interpolatedData[x, y+dy[:,0], z] + interpolatedData[x, y+dy[:,1], z] +
interpolatedData[x, y, z+dz[:,0]] + interpolatedData[x, y, z+dz[:,1]]) / 6)

更多回答

When I use your method (with an array of the shape 246 x 93 x114) I get the an index error at the Gauss-Seidl Interpolation step --> IndexError: index 114 is out of bounds for axis 2 with size 114. Can it be that the code is only working when the dimension lengh is the same?

当我使用您的方法(使用形状为246x93x114的数组)时,我在Gauss-Seidl内插步骤-->IndexError:索引114超出了大小为114的轴2的界限。难道代码只有在尺寸长度相同时才起作用吗?

And can you also explain why the lengh you are interpolating over is 100 (Gauss Seidl Interpolation --> ("is for _ in range(100)"). Because of the 10x10 axis?

你还能解释为什么你插值的长度是100(高斯塞德尔插值-->(“is for _ in range(100)”)。由于10 x10轴?

Another uncertainty came up: You made a meshgrid of which you created the array. I have an xarray and convert it into an numpy array. As I have to state the dimensions I use: #name dimensions of nan values xc= nanIndex[0] # time yc= nanIndex[1] # lat zc= nanIndex[2] # lon Is this right? or is there a differerence to the created meshgrid dimensions? Thank you very much in advance for your help!

另一个不确定性出现了:你制作了一个网格,并在其中创建了数组。我有一个xarray值,并将其转换为NumPy数组。因为我必须说明我使用的维度:#名称NaN值的维度xc=nanIndex[0]#time yc=nanIndex[1]#lat zc=nanIndex[2]#lon,对吗?或者与创建的网格尺寸有什么不同?非常感谢您的帮助!

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com