gpt4 book ai didi

string - 在 R 中使用可变字符串引用对象

转载 作者:行者123 更新时间:2023-12-01 10:05:59 29 4
gpt4 key购买 nike

编辑:感谢那些到目前为止做出回应的人;我是 R 的初学者,刚刚为我的 MSc 论文承担了一个大型项目,所以我对初始处理有点不知所措。我使用的数据如下(来自 WMO 公开可用的降雨数据):



120 6272100 KHARTOUM 15.60 32.55 382 1899 1989 0.0
<br/>1899 0.03 0.03 0.03 0.03 0.03 1.03 13.03 12.03 9999 6.03 0.03 0.03
<br/>1900 0.03 0.03 0.03 0.03 0.03 23.03 80.03 47.03 23.03 8.03 0.03 0.03
<br/>1901 0.03 0.03 0.03 0.03 0.03 17.03 23.03 17.03 0.03 8.03 0.03 0.03
<br/>(...)
<br/><code>120 6272101 JEBEL AULIA 15.20 32.50 380 1920 1988 0.0<br/>
1920 0.03 0.03 0.03 0.00 0.03 6.90 20.00 108.80 47.30 1.00 0.01 0.03
<br/>1921 0.03 0.03 0.03 0.00 0.03 0.00 88.00 57.00 35.00 18.50 0.01 0.03
<br/>1922 0.03 0.03 0.03 0.00 0.03 0.00 87.50 102.30 10.40 15.20 0.01 0.03
<br/>(...)</code>

<code>

<p>There are ~100 observation stations that I'm interested in, each of which has a varying start and end date for rainfall measurements. They're formatted as above in a single data file, with stations separated by "120 (station number) (station name)".</p>

<p>I need first to separate this file by station, then to extract March, April, May and June for each year, then take a total of these months for each year. So far I'm messing around with loops (as below), but I understand this isn't the right way to go about it and would rather learn some better technique.
Thanks again for the help!</p>

<p>(Original question:)
I've got a large data set containing rainfall by season for ~100 years over 100+ locations. I'm trying to separate this data into more managable arrays, and in particular I want to retrieve the sum of the rainfall for March, April, May and June for each station for each year.
The following is a simplified version of my code so far: </p>

<pre><code>a <- array(1,dim=c(10,12))
for (i in 1:5) {

all data:
assign(paste("station_",i,sep=""), a)

#march - june data:
assign(paste("station_",i,"_mamj",sep=""), a[,4:7])
}
</code></pre>

</code>

<code>So this gives me <code>station_(i)__mamj_</code> which contains the data for the months I'm interested in for each station. Now I want to sum each row of this array and enter it in a new array called <code>station_(i)_mamj_tot</code>. Simple enough in theory, but I can't work out how to reference station_(i)_mamj</code> so that it varies the value of i每次迭代。非常感谢任何帮助!

最佳答案

这完全是在乞求一个数据框,然后就是这个带有像 ddply 这样的强大工具的单行代码(非常强大):

tot_mamj <- ddply(rain[rain$month %in% 3:6,-2], 'year', colwise(sum))

按年份给出 M/A/M/J 的总和:

   year station_1 station_2 station_3 station_4 station_5 ...
1 1972 8.618960 5.697739 10.083192 9.264512 11.152378 ...
2 1973 18.571748 18.903280 11.832462 18.262272 10.509621 ...
3 1974 22.415201 22.670821 32.850745 31.634717 20.523778 ...
4 1975 16.773286 17.683704 18.259066 14.996550 19.007762 ...
...

下面是完美的工作代码。我们创建一个 col.names 为 'station_n' 的数据框;还有用于年和月的额外列(因子,如果你懒惰,则为整数,请参见脚注)。现在您可以按月或年进行任意分析(使用 plyr 的拆分-应用-组合范例):

require(plyr) # for d*ply, summarise
#require(reshape) # for melt

# Parameterize everything here, it's crucial for testing/debugging
all_years <- c(1970:2011)
nYears <- length(all_years)
nStations <- 101
# We want station names as vector of chr (as opposed to simple indices)
station_names <- paste ('station_', 1:nStations, sep='')

rain <- data.frame(cbind(
year=rep(c(1970:2011),12),
month=1:12
))
# Fill in NAs for all data
rain[,station_names] <- as.numeric(NA)
# Make 'month' a factor, to prevent any numerical funny stuff e.g accidentally 'aggregating' it
rain$month <- factor(rain$month)

# For convenience, store the row indices for all years, M/A/M/J
I.mamj <- which(rain$month %in% 3:6)

# Insert made-up seasonal data for M/A/M/J for testing... leave everything else NA intentionally
rain[I.mamj,station_names] <- c(3,5,9,6) * runif(4*nYears*nStations)

# Get our aggregate of MAMJ totals, by year
# The '-2' column index means: "exclude month, to prevent it also getting 'aggregated'"
excludeMonthCol = -2
tot_mamj <- ddply(rain[rain$month %in% 3:6, excludeMonthCol], 'year', colwise(sum))

# voila!!
# year station_1 station_2 station_3 station_4 station_5
# 1 1972 8.618960 5.697739 10.083192 9.264512 11.152378
# 2 1973 18.571748 18.903280 11.832462 18.262272 10.509621
# 3 1974 22.415201 22.670821 32.850745 31.634717 20.523778
# 4 1975 16.773286 17.683704 18.259066 14.996550 19.007762

作为脚注,在我将月份从数字转换为因子之前,它正在悄悄地“聚合”(直到我输入“-2”:排除列引用)。然而,更好的是当你把它作为一个因素时,它会拒绝直接聚合,并抛出一个错误(这对于调试来说是可取的):

 ddply(rain[rain$month %in% 3:6, ], 'year', colwise(sum))
Error in Summary.factor(c(3L, 3L, 3L, 3L, 3L, 3L), na.rm = FALSE) :
sum not meaningful for factors

关于string - 在 R 中使用可变字符串引用对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10588008/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com