r - fwrite.data.table 和 `yyyy-mm-dd hh:mm:ss` 格式优化，具有固定的 UTC 偏移量

转载作者：行者123 更新时间：2023-12-05 07:27:48

25

4

我想使用 R data.table 的 fwrite 以 YYYY-MM-DD hh:nn:ss 格式(非-DST 遵守 ETC/GMT+8 时区)，而不是默认的 (ISO 8601) YYYY-MM-DDThh:nn:ssZ 格式，其中一些时间戳具有小数秒，我想将其四舍五入到最接近的秒。

使用 lubridate 我已经能够使用 fread 读取日期，然后使用 x:=with_tz(x, "Etc/GMT +8")，然后是 x:=force_tz(x,"GMT")。

但是，对于我的测试数据集(12 列的 650 万个条目)，我的解决方案大多很慢，并且正在寻找更好的方法来解决问题。我不想使用 fwrite(..., dateTimeAs="write.csv")，因为那样会忽略固定的 UTC 偏移量以支持本地时间。

(各种解决方案移至我下面的“答案”)

你能想到的任何其他优化？

最佳答案

迄今为止的最佳解决方案:base-R + data.table + fasttime

#!/usr/bin/env Rscript
# above this point: set d_f and o_f to valid file paths

totTime<-proc.time()
install.load <- function(package.name)
{
  if (!require(package.name, character.only=T)) install.packages(package.name)
  library(package.name, character.only=T)
}
pp<-function(...) {
  print(paste0(...))
}
ISO2Human<-function(x) {
  ot<-substr(x,1,19) # ignore fractional seconds and "Z"
  substr(ot,11,12)<-" "
  if(anyNA(ot)) ot<-substr(x,1,10)
  return(ot)
}

install.load('data.table')
install.load('fasttime')
pp("parameters read and libraries loaded: ",timetaken(totTime))

main <- function() {
  dat<-fread(d_f,fill=TRUE)
  # notably dat has a "d_utc" column in YYYY-MM-DD hh:nn:ss format
  pp("data file Read: ",timetaken(totTime)) # 5.200sec

  # A fair amount of code is inserted here. Highlights include
  #   1. As computations appear to be faster in double/numeric form 
  #      than POSIXct (and starts as character), I adjust it as follows:
  #        dat[,d_utc:=setattr(fastPOSIXct(d_utc,tz="GMT"),"class","numeric")]
  #   2. dat gets merged with another DT using foverlaps, producing fo (see https://stackoverflow.com/q/53858287/4228193)
  # as we resume code, 8.690sec have elapsed

  # As my target timezone is UTC-08:00 (POSIXct ETC/GMT+8), I subtract 28800 seconds.
  # But to protect against a rounding error in the double type
  # (and because I have some fractional second data that I want to round)
  # I add 0.5 to this value.
  fo[,d_pst:=setattr(d_utc-28799.5,c("POSIXct","POSIXt"))][,d_utc:=NULL]
  pp("timestamps adjusted to PST (UTC-08:00): ",timetaken(totTime)) # 16.8sec

这是我在这个问题中尝试优化的代码的特定部分；但在这样做的过程中，我发现上面使用的一些类型转换似乎更优化。

  tf<-tempfile()
  fwrite(fo,file=tf)
  fo<-fread(tf)
  # fread reads in as character, not timestamps
  # POSIXct's as.character and format calls are much slower than fwrite + fread (!!!)
  fo[,DetectDate:=ISO2Human(DetectDate)] 
  # this truncates seconds, effectively rounding due to the previous adjustment of 0.5s
  unlink(tf) # delete file
  pp("coerced to string: ",timetaken(totTime)) # 26.9sec

  fwrite(fo, file = o_f, quote = FALSE)
  pp("output file written: ",timetaken(totTime)) # 27.1sec
  # aren't SSDs awesome?
}
main()

其他解决方案

基于 Lubridate 的 block (无临时文件)。顶部的时间是 mm:ss

# 01:17
j<-copy(fo)
tt<-proc.time()
j[,c("dd","dt"):=IDateTime(d_pst, ms="nearest")]
# if adding 0.5 seconds, trunc rather than nearest
j[,d_pst:=paste(dd,dt)][,c("dd","dt"):=NULL]
timetaken(tt) # 1:17
j
j[,lapply(.SD,class)]
rm(j)

使用 as.character 或 format 将 base-R POSIXct 转换为字符串

# 01:02
j<-copy(fo)
tt<-proc.time()
j[,DD2:=format(DetectDate,"%Y-%m-%d %H:%M:%S")]
timetaken(tt) # 1:02
j
j[,lapply(.SD,class)]
rm(j)

base-R隐式转换为字符+拼接日期时间

# 12:36
j<-copy(fo)
tt<-proc.time()
j[,DD2:=paste(lapply(DetectDate,substr,1,10),lapply(DetectDate,substr,12,19))] 
timetaken(tt) # 12:36
j
j[,lapply(.SD,class)]
rm(j)

base-R，避免 lapply(傻我)

# 02:29
j<-copy(fo)
tt<-proc.time()
j[,DD2:=paste(substr(DetectDate,1,10),substr(DetectDate,12,19))]
timetaken(tt) # 2:29
j
j[,lapply(.SD,class)] # just to confirm our target column is character
rm(j)

data.table + base-R，但是使用data.table的tstrsplit和paste，而不是抓取一个字符范围

# 00:24
j<-copy(fo)
tt<-proc.time()
tf<-tempfile()
fwrite(j,file=tf)
fo2<-fread(tf)
fo2[,c("compDate","compTime","compMS"):=tstrsplit(DetectDate,"[TZ.]")][
    ,DD2:=paste(compDate,compTime)]
unlink(tf)
timetaken(tt) # 0:24
fo2
fo2[,lapply(.SD,class)]
rm(j,tf,fo2)

基本上是最佳解决方案，虽然重新使用变量和字段名称，但将其减少到 10 秒

# 00:14    
fap<-function(x) {
  ot<-substr(x,1,19)
  substr(ot,11,12)<-" "
  if(is.na(ot)) ot<-substr(x,1,10)
  return(ot)
}
j<-copy(fo)
tt<-proc.time()
tf<-tempfile()
fwrite(j,file=tf)
fo2<-fread(tf)
fo2[,DD2:=fap(DetectDate)]
unlink(tf)
timetaken(tt) # 0:14
fo2
fo2[,lapply(.SD,class)]
rm(j,tf,fo2,fap)

我使用的是 (n) SSD，与“标准”设置相比，它可能大大加快了临时文件解决方案的速度

关于r - fwrite.data.table 和 `yyyy-mm-dd hh:mm:ss` 格式优化，具有固定的 UTC 偏移量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53825194/

25

4

0

文章推荐： reactjs - React 输入光标位置移动到末尾？

文章推荐：带有文档过滤器的 Java REGEX

文章推荐： powershell - Windows 自定义不适用于 Terraform

文章推荐： java - maven项目中如何判断版本来自哪里？

java - yyyy-MM-dd'T'HH :mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX 之间的日期格式差异
我正在尝试使用这两种格式解析日期 2014-12-03T10:05:59.5646+08:00: yyyy-MM-dd'T'HH:mm:ss yyyy-MM-dd'T'HH:mm:ssXXX 当我使用
php - 如何转换yyyy-MM-ddTHH :mm:ssZ to yyyy-MM-dd HH:mm:ss?
Paypal 返回以下格式的时间戳: yyyy-MM-ddTHH:mm:ssZ 我不知道该怎么办... 如何在 php 中使用本地时区将其转换为 yyyy-MM-dd HH:mm:ss？我很想pre
Excel默认日期时间格式 yyyy-mm-dd hh :mm:ss versus yyyy-mm-dd hh:mm
我正在使用 Excel 2010 或 Excel 2007 导入包含日期/时间信息的 CSV 文件。我的 CSV 文件中的时间戳具有以下格式:yyyy-mm-dd hh:mm:ss。 (例如:2015
java - 如何获取 MM/dd/yyyy kk :mm seperately from MM/dd/yyyy kk:mm in SimpleDateFormat
这个问题已经有答案了: Separate Date and Time objects (2 个回答) 已关闭 4 年前。如何从 SimpleDateFormat("MM/dd/yyyy kk:mm"
java - 将格式 "yyyy-mm-ddTHH:MM:SS+/-0000"的日期转换为 "yyyy-mm-ddTHH:MM:SSZ"
这个问题已经有答案了: Java string to date conversion (17 个回答) 已关闭 6 年前。我需要将日期字符串转换为另一种特定格式。例如:我有一个日期，可以是 YYY
c# - 转换 dd/MM/yyyy hh :mm am/pm to MM/dd/yyyy hh:mm am/pm
我想将字符串:24/11/2016 04:30 pm 转换为日期时间值:11/24/2016 04:30 pm。我的代码为: DateTime date = DateTime.ParseExact(
linux - 如何转换dd/mm/yy hh :mm:ss to yyyy-mm-ddThh:mm:ss using linux?
我想使用 linux 将像“26/11/05 06:00:01,057000000”这样的纪元转换为 yyyy-mm-ddThh:mm:ss？我曾尝试使用以下脚本但没有成功: echo 26/11/
MySQL yyyy-mm-ddThh :mm:ss. sssZ 到 yyyy-mm-dd hh:mm:ss
这个问题在这里已经有了答案: mysql YYYY-MM-DDThh:mm:ss (1 个回答) 关闭 6 年前。我想上传包含 yyyy-mm-ddThh:mm:ss.sssZ 数据的 csv 文
java - 字符串 (dd-MM-yyyy HH :mm) to Date (yyyy-MM-dd HH:mm) | Java
我在“dd-MM-yyyy HH:mm”中有一个字符串，需要将其转换为格式为日期的对象“yyyy-MM-dd HH:mm”。下面是我用来转换的代码 oldScheduledDate = "16-05
Pandas - 从 dd/mm/yy hh :mm:ss to yyyy-mm-dd hh:mm:ss 转换日期列
我有一个数据框(df)，它有一个日期列(列名:sale_date)，它以以下格式存储数据 dd/mm/yy hh:mm:ss 我正在尝试将其转换为 yyyy-mm-dd hh:mm:ss。尝试了以下但
javascript - 如何使用javascript将数据格式 "YYYY-mm-dd hh:mm:ss"转换为 "dd-mm-YYYY hh:mm:ss"？
我的数据库中有日期时间列(格式为 YYYY-mm-dd hh:mm:ss)。我需要将其转换为 dd-mm-YYYY hh:mm:ss 格式。我该怎么办？帮助我。最佳答案不确定如何在 javas
java - yyyy-MM-dd'T'HH :mm:ss. SSSZZ 和 yyyy-MM-dd'T'HH :mm:ss. SSSXXX 之间的区别？
想知道它们是否代表不同的格式或本质上相同(只是新与旧的演示文稿)。最佳答案基于DateTimeFormatter : Offset X and x: This formats the offset
java - 如何将字符串 yyyy-MM-ssThh-mm-ss 转换为 LocalDataTime yyyy-MM-ss hh-mm-ss？
如标题所示，我有一个问题。我需要将 LocalDataTime yyyy-MM-ssThh-mm-ss 解析为 LocalDataTime yyyy-MM-ss hh-mm-ss 但是当我这样做时 S
mySQL:从 "yyyy-mm-ddThh-mm-ss.sssZ"转换为 "yyyy-mm-dd hh-mm-ss"
我想用 mySQL 将我的数据从“yyyy-mm-ddThh-mm-ss.sssZ”转换为“yyyy-mm-dd hh-mm-ss”。我尝试使用 convert_tz: mysql> SELECT
mysql - 时间戳格式 dd/mm/yyyy hh :mm:ss to yyyy-mm-dd hh:mm:ss - MySQL
这个问题在这里已经有了答案: LOAD DATA INFILE easily convert YYYYMMDD to YYYY-MM-DD? (1 个回答) 关闭 6 年前。我正在尝试将 CSV
javascript - 更改数据格式 yyyy/mm/dd - HH :MM:SS to mm/dd/yyyy - HH:MM:SS in JavaScript
我需要更改 string 的日期格式。原始字符串的格式如下: var timeStamp = '2014/07/30 - 14:15:36' 这是我想要实现的日期格式: var timeStampAr
Java 格式 yyyy-MM-dd'T'HH :mm:ss. SSSz 转 yyyy-mm-dd HH:mm:ss
我正在尝试将 yyyy-MM-dd'T'HH:mm:ss.SSSz 格式的日期格式化为 yyyy-mm-dd HH:mm:ss，这应该很容易，但我无法获得它可以工作。需要解析的日期格式为:2012-
azure - Synapse 自动转换 ISO 日期字符串参数 yyyy-mm-ddThh :mm:ss into mm/dd/yyyy hh:mm:ss
我正在尝试通过传递时间戳作为命令行参数来使用 Synapse 管道运行 Spark 作业。在与 Spark 作业相关的代码运行之前，synapse 正在将字符串命令行参数值从 ISO 格式 2019-
android - 如何从 dd-MM-yyyy HH :mm:ss to only dd-MM-yyyy HH:mm? 更改 android 中的日期格式
我试过下面的代码: String created_Date = "25-Nov-15 14:23:34"; SimpleDateFormat sdf = new SimpleDateFormat("d
MySQL - 函数中从 'yyyy-mm-mm' 到 'yyyy-mm' 的日期格式
我创建了一个函数，它以与原始格式不同的格式返回日期。基本上，我正在使用此 Select MonthSub('2014-04-10',2)# 语句进行测试，它应该返回2014-02，而不是 2014-0

首页

博学

6Ren·AI

商城

r - fwrite.data.table 和 `yyyy-mm-dd hh:mm:ss` 格式优化，具有固定的 UTC 偏移量

迄今为止的最佳解决方案:base-R + data.table + fasttime

其他解决方案