r - 使用灵活的调用(在循环中使用)从宽到长的不同宽度的数据透视-6ren

r - 使用灵活的调用(在循环中使用)从宽到长的不同宽度的数据透视

转载作者：行者123 更新时间：2023-12-03 14:53:29

我需要pivot一些宽的时间序列数据，即使用 tidyr 改变宽度的宽度的 pivot_longer() .

数据是季度数据，但我在年块(四个季度)和六个月块(只有两个季度)中收到数据，即数据在宽度方面有所不同。

我想找一个简单和灵活可以在循环中使用的解决方案，因为我需要导入多年零六个月的块(并且，因为我需要说服我的研究小组使用 R，我在这里要求使用 (最好)tidyverse)。

年份块中的数据看起来有点像这样，

dta_wide1 <- structure(list(V1 = c("", "", "", "", "", "", "", "peach", "dragonfruit", "honeydew", "huckleberry", "", ""), V2 = c("ABC", "some info", "Store A", "", "As of 31/03/2019", "label1", "", "7", "5", "6", "1", "(a) some useless clutter", "(b) more not relevent information"), V3 = c("", "", "", "", "", "", "label2", "0.5", "0.4", "0.8", "0.3", "", ""), V4 = c("", "", "", "", "", "label4", "label4a", "21", "21", "87", "21", "", ""), V5 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V6 = c("", "", "", "", "As of 30/06/2019", "label1", "", "5", "2", "3", "7", "", ""), V7 = c("", "", "", "", "", "", "label2", "0.46", "0.72", "0.7", "0.8", "", ""), V8 = c("", "", "", "", "", "label4", "label4a", "19", "22", "85", "25", "", ""), V9 = c("", "", "", "", "", "", "label4b", "0.4", "0.1", "0.3", "0.2", "", ""), V10 = c("", "", "", "", "As of 30/09/2019", "label1", "", "4", "1", "4", "8", "", ""), V11 = c("", "", "", "", "", "", "label2", "0.1", "0.3", "0.6", "0.22", "", ""), V12 = c("", "", "", "", "", "label4", "label4a", "21", "23", "71", "27", "", ""), V13 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V14 = c("", "", "", "", "As of 31/12/2019", "label1", "", "8", "6", "9", "9", "", ""), V15 = c("", "", "", "", "", "", "label2", "0.7", "0.87", "0.55", "0.33", "", ""), V16 = c("", "", "", "", "", "label4", "label4a", "24", "25", "99", "35", "", ""), V17 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", "")), class = "data.frame", row.names = c(NA, -13L))

就像这样在六个月的街区里，

dta_wide2 <- structure(list(V1 = c("", "", "", "", "", "", "", "peach", "dragonfruit", "honeydew", "huckleberry", "", ""), V2 = c("ABC", "some info", "Store A", "", "As of 31/03/2020", "label1", "", "2", "3", "4", "8", "(a) some useless clutter", "(b) more not relevent information"), V3 = c("", "", "", "", "", "", "label2", "0.1", "0.2", "0.3", "0.8", "", ""), V4 = c("", "", "", "", "", "label4", "label4a", "10", "11", "12", "9", "", ""), V5 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V6 = c("", "", "", "", "As of 30/06/2020", "label1", "", "4", "6", "8", "16", "", ""), V7 = c("", "", "", "", "", "", "label2", "0.22", "0.33", "0.44", "0.55", "", ""), V8 = c("", "", "", "", "", "label4", "label4a", "11", "12", "13", "10", "", ""), V9 = c("", "", "", "", "", "", "label4b", "0.4", "0.1", "0.3", "0.2", "", "")), class = "data.frame", row.names = c(NA, -13L))

即(对于六个月的块)

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
dta_wide2 %>% as_tibble
# A tibble: 13 x 9
V1       V2             V3     V4      V5     V6       V7    V8     V9    
<chr>    <chr>          <chr>  <chr>   <chr>  <chr>    <chr> <chr>  <chr> 
   1 ""       "ABC"          ""     ""      ""     ""       ""    ""     ""    
2 ""       "some info"    ""     ""      ""     ""       ""    ""     ""    
3 ""       "Store A"      ""     ""      ""     ""       ""    ""     ""    
4 ""       ""             ""     ""      ""     ""       ""    ""     ""    
5 ""       "As of 31/03/~ ""     ""      ""     "As of ~ ""    ""     ""    
6 ""       "label1"       ""     "label~ ""     "label1" ""    "labe~ ""    
7 ""       ""             "labe~ "label~ "labe~ ""       "lab~ "labe~ "labe~
8 "peach"  "2"            "0.1"  "10"    "0.3"  "4"      "0.2~ "11"   "0.4" 
9 "dragon~ "3"            "0.2"  "11"    "0.1"  "6"      "0.3~ "12"   "0.1" 
10 "honeyd~ "4"            "0.3"  "12"    "0.4"  "8"      "0.4~ "13"   "0.3" 
11 "huckle~ "8"            "0.8"  "9"     "0.2"  "16"     "0.5~ "10"   "0.2" 
12 ""       "(a) some use~ ""     ""      ""     ""       ""    ""     ""    
13 ""       "(b) more not~ ""     ""      ""     ""       ""    ""     ""

在 dta_wide2日期键像这样四处飘荡

> dta_wide2[5,] %>% str_sub(start= -10) %>% lubridate::dmy()
[1] NA           "2020-03-31" NA           NA           NA          
[6] "2020-06-30" NA           NA           NA

所以我试着像这样整理

dta_wide2 %>% 
   add_column(date1 = dta_wide2[5,2] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 2)  %>% 
   add_column(date2 = dta_wide2[5,6] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 6) %>% 
   add_column(store = dta_wide2[3,2], .before = 2) %>% as_tibble

# A tibble: 13 x 12
   V1    store date1      V2    V3    V4    date2      V5    V6    V7   
   <chr> <chr> <date>     <chr> <chr> <chr> <date>     <chr> <chr> <chr>
 1 ""    Stor~ 2020-03-31 "ABC" ""    ""    2020-06-30 ""    ""    ""   
 2 ""    Stor~ 2020-03-31 "som~ ""    ""    2020-06-30 ""    ""    ""   
 3 ""    Stor~ 2020-03-31 "Sto~ ""    ""    2020-06-30 ""    ""    ""   
 4 ""    Stor~ 2020-03-31 ""    ""    ""    2020-06-30 ""    ""    ""   
 5 ""    Stor~ 2020-03-31 "As ~ ""    ""    2020-06-30 ""    "As ~ ""   
 6 ""    Stor~ 2020-03-31 "lab~ ""    "lab~ 2020-06-30 ""    "lab~ ""   
 7 ""    Stor~ 2020-03-31 ""    "lab~ "lab~ 2020-06-30 "lab~ ""    "lab~
 8 "pea~ Stor~ 2020-03-31 "2"   "0.1" "10"  2020-06-30 "0.3" "4"   "0.2~
 9 "dra~ Stor~ 2020-03-31 "3"   "0.2" "11"  2020-06-30 "0.1" "6"   "0.3~
10 "hon~ Stor~ 2020-03-31 "4"   "0.3" "12"  2020-06-30 "0.4" "8"   "0.4~
11 "huc~ Stor~ 2020-03-31 "8"   "0.8" "9"   2020-06-30 "0.2" "16"  "0.5~
12 ""    Stor~ 2020-03-31 "(a)~ ""    ""    2020-06-30 ""    ""    ""   
13 ""    Stor~ 2020-03-31 "(b)~ ""    ""    2020-06-30 ""    ""    ""   
# ... with 2 more variables: V8 <chr>, V9 <chr>

现在，我需要使用更长的时间来旋转它，如果我得到正确的， pivot_longer ，但是我的挑战是如何 - 当我还获得看起来像 dta_wide1 的数据时，即有四个季度——我是否以一种灵活的方式来做，我可以同时使用 dta_wide1和 dta_wide2 .

我已经在这方面工作了一段时间，任何使它工作、简单化或清理它的帮助都会非常有用。

这是我目前所处的位置，但不正确，不灵活，而且不简单

dta_wide2_foo <- dta_wide2
names(dta_wide2_foo) <- c('goods', paste0(dta_wide2[6,2:5], dta_wide2[7,2:5], sep = '_1'), paste0(dta_wide2[6,2:5], dta_wide2[7,2:5], sep = '_2'))
dta_wide2_foo %>% 
   add_column(date1 = dta_wide2[5,2] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 2)  %>% 
   add_column(date2 = dta_wide2[5,6] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 6) %>% 
   add_column(store = dta_wide2[3,2], .before = 2) %>% as_tibble %>% .[8:11,]  %>%
   pivot_longer(-c(goods, store, date1, date2), values_to = "Value", names_to = "variable") %>% print(n = 100)

或者，一些通用的片段，它既不简单，也不聪明，也不干净，但它可用于获取循环中两个样本数据中日期的位置

dta <- dta_wide2
dta[5,] %>% str_sub(start= -10) %>% lubridate::dmy() %>% { which(!is.na(.)) }
[1] 2 6

或者，更清洁，

dta <- dta_wide1
dta[5,] %>% grep("As ",.)
[1]  2  6 10 14

更新 2020-06-08 07:45:18Z

我的目标是结合长数据集来绘制数据，( Wimpel suggest below 我结合了不同的宽数据集，即 dta_wide1 、 dta_wide2 、 ... dta_widen ，使用 lapply()调用)我想象数据看起来像这样，

> dta_long
# A tibble: 96 x 5
   product label   value date       store  
   <chr>   <chr>   <dbl> <date>     <chr>  
 1 peach   label1   7    2019-03-31 Store A
 2 peach   label2   0.5  2019-03-31 Store A
 3 peach   label4a 21    2019-03-31 Store A
 4 peach   label4b  0.3  2019-03-31 Store A
 5 peach   label1   5    2019-06-30 Store A
 6 peach   label2   0.46 2019-06-30 Store A
 7 peach   label4a 19    2019-06-30 Store A
 8 peach   label4b  0.4  2019-06-30 Store A
 9 peach   label1   4    2019-09-30 Store A
10 peach   label2   0.1  2019-09-30 Store A
# ... with 86 more rows

然后 ggplot2/用这样的东西绘制日期，

dta_long %>% filter(label == 'label1') %>% ggplot(aes(date, value, colour = product)) + 
geom_line() + scale_x_date(date_breaks = "3 months", 
date_labels = "%b-%y", limits = c((min(dta_long$date)-34), max = max(dta_long$date)))

最佳答案

我保存了您的两个示例数据集并将它们存储在单独的 .xlsb 文件中。
数据如下所示:

也许这会有所帮助......该解决方案适用于提供的两个样本集，所以试一试。
该代码假设所有数据都具有相同的格式，因此所有信息始终位于同一行中，而 storename 始终位于同一列中。

library( readxlsb )
library( cellranger )
library( tidyverse )
library( data.table )

#get filesnames to read
read.these.files <- list.files( path = "./temp/", 
                                pattern = ".*\\.xlsb",
                                full.names = TRUE,
                                recursive = FALSE )
#now read the data to a list, using lapply()
#  assuming the data needed is in the first sheet of the .xlsb-file
L <- lapply( read.these.files, readxlsb::read_xlsb, sheet = 1, range = cellranger::cell_limits() )
#now we can loop over the read in data in list 'L', and perform operations
L.dt <- lapply( L, function(x) {
  #get store_name
  store_name = x[2,2]
  #get the data
  df1 <- x[7:10,]
  #set the colmanes (=labels) right
  colnames <- x[5:6,]
  colnames[ colnames == "" ] <- NA
  names(df1) <- colnames %>% tidyr::fill( names(colnames) ) %>% slice(2)
  names(df1)[1] <- "product"
  #melt df1 to long format
  df1 <- df1 %>% tidyr::pivot_longer( cols = tidyselect::starts_with("label"), names_to = "label" )
  #set the dates right
  dates <- x[4, ]
  dates <- dates %>% tidyr::pivot_longer( cols = tidyselect::everything())
  dates[ dates == "" ] <- NA
  dates <- tidyr::fill( dates, value ) %>% dplyr::slice(2:n() )
  #add the dates and storename and tidy the .copy column
  df1 <- df1 %>% 
    dplyr::mutate( date  = rep( dates$value, nrow(df1) / length( dates$value) ),
            store = store_name ) %>%
    dplyr::select( -.copy )
})
#create a names list, based on the sourecefile-names
names(L.dt) <- basename( read.these.files )
#now, bind the list of alterend data together into one _long_ data set
L.dt_tbl <- bind_rows(L.dt, .id = 'id')
L.dt_tbl %>% dplyr::mutate(date = str_sub(date, start= -10)  %>%
             lubridate::dmy() ) -> L.dt_tbl
'

转换 value输入 double ,

dta_long <- type_convert(L.dt_tbl, cols(
  `Type of NPE` = col_character(),
  `What NPE` = col_character(),
  value = col_double(),
  institut = col_character()
))

最终数据，

dta_long
# A tibble: 96 x 6
   id             product label   value date       store  
   <chr>          <chr>   <chr>   <dbl> <date>     <chr>  
 1 dta_wide1.xlsb peach   label1   7    2019-03-31 Store A
 2 dta_wide1.xlsb peach   label2   0.5  2019-03-31 Store A
 3 dta_wide1.xlsb peach   label4a 21    2019-03-31 Store A
 4 dta_wide1.xlsb peach   label4b  0.3  2019-03-31 Store A
 5 dta_wide1.xlsb peach   label1   5    2019-06-30 Store A
 6 dta_wide1.xlsb peach   label2   0.46 2019-06-30 Store A
 7 dta_wide1.xlsb peach   label4a 19    2019-06-30 Store A
 8 dta_wide1.xlsb peach   label4b  0.4  2019-06-30 Store A
 9 dta_wide1.xlsb peach   label1   4    2019-09-30 Store A
10 dta_wide1.xlsb peach   label2   0.1  2019-09-30 Store A
# ... with 86 more rows

关于r - 使用灵活的调用(在循环中使用)从宽到长的不同宽度的数据透视，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62153859/

文章推荐： amazon-web-services - 使用 API key 或授权方授权 AWS API Gateway

文章推荐： amazon-web-services - 什么是AWS完美替代Azure SignalR服务？

C# 灵活/动态使用发布
我有以下几行代码: using XXX.PAD.PaidServices; using YYY= XXX.PAD.PaidServices.Judet; //// uncomment below fo
python - 灵活、可靠和便携的服务发现
我正在寻找一种方法，让 LAN 中的客户端无需任何配置即可找到我的服务器应用程序的所有实例。我不想自己破解某些东西，而是想使用现有的解决方案。就个人而言，我需要用 Python 完成它，但我很乐意听到
php - 具有多个(灵活)变量的产品的数据库设计
我的“问题”或“挑战”有很多问题。但它们都不符合我的需求。好的，我开始了: 我想要一个保存可变属性的数据库。现在我将它们作为列名“硬编码”到数据库中(参见图片)。请参阅“颜色”和“尺寸”。但是如果我想
html - 图标在缩放(灵活)背景上的位置
我正在为我的一个 friend 创建一个网站。我们得到了这些布局，我已经放了一个灵活的背景，可以缩放到当前的浏览器大小。但是放置在背景上的图标需要在缩放窗口时保持相对位置。意味着如果我调整窗口大小，让
iphone - UITableView 灵活/动态 heightForRowAtIndexPath
案例通常，您会使用 cellForRowAtIndexPath 委托(delegate)方法来设置单元格。单元格的信息集对于单元格的绘制方式和大小非常重要。不幸的是，heightForRowAtI
flexibility - 如何在不落后于浏览器窗口的情况下使带边距的 div block 灵活
我试图让下面的 div 变得灵活 div { min-width: 500px; max-width: 1000px; width:100%; height: 400px; margin-left:1
ios - 单元测试(快速/灵活)-(未找到测试)
我在单元测试方面遇到问题。当我运行测试时，它以 "No tests found" 结束。我正在使用 AppCode 和 Quick/Nimble 框架进行单元测试，但它在 XCode 中也不起作用。
java - 是否可以使 Class.forName ("") 灵活？
问之前，请理解我的英语不好。我在 servlet 编程中使用 Class.forName(...) 类。当我访问 servlet 时，我从数据库中获得一行详细的 Controller 信息，指示要使
python - GAE 灵活，是否会为每个实例单独创建多个 cron 作业？
我创建了一个cron job在 GAE 灵活环境中，每 15 分钟自动运行一次。但是在创建每个实例时，是否会为每个实例复制相同的 cron 作业？我对此不太确定。最佳答案不，不会为每个实例复制
html - 灵活、固定的 div - 需要在底部粘贴另一个 div
div A 灵活且固定(位于窗口顶部)，具有高 z-index，因此下方的元素可以在滚动时从下方通过。 div B 位于 div A 下方，我希望它“粘”在 div A 的底部，因为在调整窗口大小时高
Swift 快速/灵活 : Ambiguous use of expect
我是 Quick/Nimble 的新手，所以我尝试了一个简单的单元测试: import Quick import Nimble class DarkSkyTests: QuickSpec {
javascript - 使用 css 使表格 td 灵活
我创建了一个演示。关于表格单元格的灵 active ，我需要一些帮助。我有一些表格单元格，它们的宽度应该是固定的。但单元格由标签组成，标签可以是长文本，也可以是短文本。基于此标签，表格单元格应该
html - css - 双色，灵活，水平背景 "band"
我非常熟悉在代码中需要“水平带”的网站 - 即使他们的内容固定在 960 像素以内，他们的背景也会向左和向右“延伸”。我知道如何做这些，如果它们只有一种颜色，或者我可以用作背景的一张图片。最近一位
php - 如何强制使用 HTTPS(Cloudflare 灵活 SSL)？
我在自己编写的网站(无框架或 CMS)上使用 Cloudflare Flexible SSL。一切正常，现在我想在整个网站上使用 HTTPS。我在 Apache Web 服务器上使用 PHP。我想知
java - 如何使这个 Java 接口(interface)灵活
我有这个 Java 接口(interface): public interface Box { public void open(); public void close(); } 这个接
html - Bootstrap，无论 html 布局如何，都允许内容响应/灵活
所以我有一个“主要”功能系统，现在可以作为 CMS 使用:用户进入编辑器并从四个模板中选择一个。在模板中，他们单击可以添加图像、文本或两者的部分。我有一个预览屏幕，可以向他们展示他们正在制作的东西，
javascript - 灵活/流畅的 HTML 表单的 CSS 秘诀是什么？
下面的 HTML/CSS/Javascript (jQuery) 代码显示了 #makes 选择框。选择一个选项会显示带有相关选项的 #models 选择框。 #makes 选择框偏离中心，#mode
html - 灵活 div 的背景渐变填充，使用 css 还是图像？
我正在使用固定高度的图像来填充具有渐变颜色的 div，方法如下:背景:透明 url(green_bg.gif) repeat-x scroll 0 0; 但是它只填充一个等于图像高度的高度。根据其中的
css - 当浏览器大小/屏幕分辨率发生变化时，如何使 div/section 灵活？
我的系统的 GUI 在 1366 X 768 分辨率下运行良好。当它以不同的分辨率显示时，我需要并排滚动，而它不应该这样。此外，当我尝试在 chrome 中按 ctr+- 时，div 和部分变得困惑。
python - Google Cloud App Engine 灵活 - 日志不工作
我正在尝试在 google app engine 灵活环境中使用 python 编写日志。我想使用默认的 python 日志记录库并使用处理程序进行日志记录。这是我的代码: import loggi

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 使用灵活的调用(在循环中使用)从宽到长的不同宽度的数据透视