gpt4 book ai didi

r - 这是将数据从 EUROSTAT 获取到 R 的解决方案吗?

转载 作者:行者123 更新时间:2023-12-01 19:34:08 25 4
gpt4 key购买 nike

我经常使用来自 EUROSTAT 的数据,发现数据无法直接加载到 R 中非常烦人。我编写此代码片段是为了获取 EUROSTAT http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic%2Fen 的批量下载工具提供的任何数据集。

还有更好的办法吗? ..这个对我有用:

    #this library is used to download data from eurostat and to find datasets
#later extend to extend to find datasets with certain dimensions

#download data from eurostat
#unpack and convert to dataframe
#load label descriptions
#load factors
#save as r data object

datasetname="ebd_all"

LANGUAGE="en"

install.packages("RCurl")
library(RCurl)
library(data.table)
library(reshape)
library(stringr)

baseurl="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=data%2F"

fullfilename=paste(datasetname,".tsv.gz",sep="")
temp <- paste(tempfile(),".gz",sep="")
download.file(paste(baseurl,fullfilename,sep=""),temp)
dataconnection <- gzfile(temp)
d=read.delim(dataconnection)
longdata=melt(d,id=colnames(d)[1])

firstname=colnames(d)[1] # remove .time and count how many headings are there
firstname=substr(firstname,1,nchar(firstname)-nchar(".time"))
headings=toupper(strsplit(firstname,".",fixed=TRUE)[[1]])
headingcount=length(headings)
colnames(longdata)=c("dimensions","time","value")


#get the data on the dimension tables
df=data.frame(dimensions=as.character(longdata[,"dimensions"]))
df = transform(df, dimensions= colsplit(dimensions, split = "\\,",names=headings))
dimensions=data.table(df$dimensions)

#download the dimension labels - save headings as better variable
dimfile=paste("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=dic%2F",LANGUAGE,"%2Fdimlst.dic",sep="")

temp <- paste(tempfile(),".gz",sep="")
download.file(dimfile,temp)
dataconnection <- gzfile(temp)
dimdata=read.delim(dataconnection,header=FALSE)
colnames(dimdata)=c("colname","desc")
lab=dimdata$desc
names(lab)=dimdata$colname

#create headings that speak for themselves for columns
speakingheadings=as.character(lab[headings])

#download factors for each heading and add
for(heading in headings){
factorfile=paste("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=dic%2F",LANGUAGE,"%2F",tolower(heading),".dic",sep="")
temp <- paste(tempfile(),".gz",sep="")
download.file(factorfile,temp)
dataconnection <- gzfile(temp)
factordata=read.delim(dataconnection,header=FALSE)
colnames(factordata)=c(heading,paste(heading,"_desc",sep=""))
#join the heading to the heading dataset
dimensions=merge(dimensions,factordata,by=heading,all.x=TRUE)
}


#at the end at speaking headings
setnames(dimensions,colnames(dimensions)[1:length(speakingheadings)],speakingheadings)

#add data columns by writing and reading again---FASTER ;-)
temp=tempfile()
values=data.frame(value=as.character(longdata$value))
values = transform(values, value= colsplit(value, split = "\\ ",names=c("value","flag")))
values=values$value
values=data.table(values)

values$value=as.character(values$value)
values$flag=as.character(values$flag)
values[value==flag,flag:=NA]
values$value=as.double(values$value)

eurostatdata=cbind(dimensions,time=longdata$time,values)
save(eurostatdata,file=paste(datasetname,".RData"))

最佳答案

查看SmarterPoland包,其中有直接从EUROSTAT下载(并进入R)数据的功能。

这里是例子:

library(SmarterPoland)
# info about passagers
grepEurostatTOC("split of passenger transport")
## get table
tmp <- getEurostatRCV("tsdtr210")
summary(tmp)

## vehicle geo time value
## BUS_TOT:756 AT : 63 1990 : 108 Min. : 0.0
## CAR :756 BE : 63 1991 : 108 1st Qu.: 6.9
## TRN :756 BG : 63 1992 : 108 Median :12.9
## CH : 63 1993 : 108 Mean :33.6
## CY : 63 1994 : 108 3rd Qu.:77.4
## CZ : 63 1995 : 108 Max. :93.4
## (Other):1890 (Other):1620 NA's :397

来源:www.smarterpoland.pl

关于r - 这是将数据从 EUROSTAT 获取到 R 的解决方案吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12762431/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com