gpt4 book ai didi

r - 如何获得累积和图上常规时间点插值的斜率?

转载 作者:行者123 更新时间:2023-12-02 03:38:29 25 4
gpt4 key购买 nike

在交叉验证中,我问了 question关于按日期分析数据,但不想通过按月对数据进行分箱来生成虚假的峰值和波谷。例如,如果一个人在每个月的最后一天支付账单,但有一次支付晚了几天,那么一个月的费用将为零,而下个月的费用将是平时的两倍。都是假垃圾。

answers 之一我的问题解释了使用线性样条平滑累积和来克服分箱中的打嗝的插值概念。我对它很感兴趣,想在 R 中实现它,但在网上找不到任何例子。我不只是想打印绘图。我想获得每个时间点(也许每天)的瞬时斜率,但该斜率应该从输入几天(或者可能几周或几个月)之前到几天的点的样条线得出时间点之后。换句话说,在一天结束时我想要得到一些东西,例如一个数据框,其中一列是每天的钱或每周的患者,但这不受变幻莫测的影响,例如我是否延迟支付几天或该月是否碰巧有 5 个手术日(而不是通常的 4 天)。

这是一些简化的模拟和绘图,以显示我所面临的情况。

library(lubridate)
library(ggplot2)
library(reshape2)
dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3 #we are making one payment date that is 3 days late
dates#look how the payment date is the last day of every month except for
#2010-05 where it takes place on 2010-06-03 - naughty boy!
amounts <- rep(50,each=24)# pay $50 every month
register <- data.frame(dates,amounts)#this is the starting register or ledger
ggplot(data=register,aes(dates,amounts))+geom_point()#look carefully and you will see that 2010-05 has no dots in it and 2010-06 has two dots
register.by.month <- ddply(register,.(y=year(dates),month=month(dates)),summarise,month.tot=sum(amounts))#create a summary of totals by month but it lands up omiting a month in which nothing happened. Further badness is that it creates a new dataframe where one is not needed. Instead I created a new variable that allocates each date into a particular "zone" such as month or
register$cutmonth <- as.Date(cut(register$dates, breaks = "month"))#until recently I did not know that the cut function can handle dates
table(register$cutmonth)#see how there are two payments in the month of 2010-06
#now lets look at what we paid each month. What is the total for each month
ggplot(register, aes(cutmonth, amounts))+ stat_summary(fun.y = sum, geom = "bar")#that is the truth but it is a useless truth

When one is late with a payment by a couple of days it appears as if the expense was zero in one month and double in the next. That is spurious

#so lets use cummulated expense over time
register$cumamount <- cumsum(register$amounts)
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
cum+stat_smooth()

cumulative amount over time smooths out variability that changes an item's bin

#That was for everything the same every month, now lets introduce a situation where there is a trend that in the second year the amounts start to go up, 
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)#this is the monthly amount with a growth of amount in each month of the second year
register <- cbind(register,amounts.up)#add the variable to the data frarme
register$cumamount.up <- cumsum(register$amounts.up) #work out th cumulative sum for the new scenario
ggplot(data=register,aes(x=dates))+
geom_point(aes(y=amounts, colour="amounts",shape="amounts"))+
geom_point(aes(y=amounts.up, colour="amounts.up",shape="amounts.up"))# the plot of amount by date
#I am now going to plot the cumulative amount over time but now that I have two scenarios it is easier to deal with the data frame in long format (melted) rather than wide format (casted)
#before I can melt, the reshape2 package unforutnately can't handle date class so will have to turn them int o characters and then back again.
register[,c("dates","cutmonth")] <- lapply(register[,c("dates","cutmonth")],as.character)
register.long <- melt.data.frame(register,measure.vars=c("amounts","amounts.up"))
register.long[,c("dates","cutmonth")] <- lapply(register.long[,c("dates","cutmonth")],as.Date)
ggplot(register.long, aes(cutmonth,value))+ stat_summary(fun.y = sum, geom = "bar")+facet_grid(. ~ variable) #that is the truth but it is a useless truth,
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
#that is the truth but it is a useless truth. Furthermore it appears as if 2010-06 is similar to what is going on in 2011-12
#that is patently absurd. All that happened was that the 2010-05 payment was delayed by 3 days.

two scenarios but showing the amount of money paid in each month

#so lets use cummulated expense over time    
ggplot(data=register.long,aes(dates,c(cumamount,cumamount.up)))+geom_point() + scale_y_continuous(name='cumulative sum of amounts ($)')

Here we see the cumulative sum data for the two scenarios

因此,对于简单绘图,变量 interpolate.daily 约为一年中每天每天 $50/30.4 = $1.64。对于第二个图,每月支付的金额在第二年每个月开始增加,将显示第一年每天的每日费率 1.64 美元,而对于第二年的日期,人们将看到每日费率逐渐从每天 1.64 美元增加到每天约 3.12 美元。

非常感谢您从头到尾阅读本文。您一定和我一样感兴趣!

最佳答案

这是一种基本方法。当然,还有更复杂的选项和需要调整的参数,但这应该是一个很好的起点。

dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3
amounts <- rep(50,each=24)
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)

df = data.frame(dates=dates, cumamount.up=cumsum(amounts.up))

df.spline = splinefun(df$dates, df$cumamount.up)

newdates = seq(min(df$dates), max(df$dates), by=1)
money.per.day = df.spline(newdates, deriv=1)

如果您绘制它,您可以看到样条线的有趣行为:

plot(newdates, money.per.day, type='l')

enter image description here

关于r - 如何获得累积和图上常规时间点插值的斜率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8408760/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com