gpt4 book ai didi

R - 如果日期在范围内,则求和

转载 作者:行者123 更新时间:2023-12-04 09:36:52 24 4
gpt4 key购买 nike

我已经设法在 excel 中轻松完成此操作,但想在 R 中完成此操作:

我有两个数据框:

MediaPlanDF(215 个观察,29 个变量)

我只关心 4 个变量:

开始日期(飞行日期),
结束日期(飞行日期),
每日消费,
每日印象

输出DF(35 obs,1 var)

日期:输出DF[[1]]

从 8/31/15 到 10/4/15 的 35 次观察

这就是设置,或者至少我是如何设置的(所有日期格式正确)。

我需要做的是向 ouputDF 添加 2 列:Daily Impressions & Daily Spend

Daily Impressions = 如果 OutputDF 中的日期在 MediaPlanDF 的开始和结束日期内,则汇总所有符合该条件的 #dailyimpressions。

每日展示次数 = 如果 OutputDF 中的日期在 MediaPlanDF 的开始和结束日期内,则汇总所有符合该条件的 #dailyspend。

这是两个 DF 的示例:

MediaPlanDF(我担心的部分):

#daysinflight   #dailyimpressions   #dailyspend Campaign name   Campaign ID Campaign flight start date  Campaign flight end date
35 392857.1429 1571.428571 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 85714.28571 428.5714286 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 142857.1429 714.2857143 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 62857.14286 942.8571429 A Real Advertiser RAND0M 8/31/2015 10/4/2015

输出DF
Date
8/31/2015
9/1/2015
9/2/2015
9/3/2015
9/4/2015
9/5/2015
...
10/4/2015

提前感谢您的任何帮助。

我收到此错误:

as.POSIXlt.character(as.character(x), ...) 中的错误:
字符串不是标准的明确格式

这是 dput
structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Accuen", 
"Amazon", "Bleacher Report", "Brightroll", "Buzzfeed", "CBSi",
"College Humor", "Complex", "ESPN", "GDN", "HULU", "IGN", "Millennial",
"Nativo", "NBA", "NBC Sports", "Pandora", "Reddit", "Spotify",
"Tremor", "TrueX", "Twitch", "Wikia", "Woven", "Yahoo!", "YouTube"
), class = "factor"), daysinflight = c(35L, 35L, 35L, 35L, 35L,
35L), dailyimpressions = c(392857.1429, 85714.28571, 142857.1429,
62857.14286, 17142.85714, 72380.94286), dailyspend = c(1571.428571,
428.5714286, 714.2857143, 942.8571429, 428.5714286, 1085.714286
), Campaign.name = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "A Real Advertiser", class = "factor"),
Campaign.ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "RAND0M", class = "factor"),
startdate = structure(c(16678, 16678, 16678, 16678, 16678,
16678), class = "Date"), enddate = structure(c(16712, 16712,
16712, 16712, 16712, 16712), class = "Date"), Campaign.budget = c(5100206L,
5100206L, 5100206L, 5100206L, 5100206L, 5100206L), Campaign.planned.cost = c(4663350.2,
4663350.2, 4663350.2, 4663350.2, 4663350.2, 4663350.2), Campaign.buy.total = c(4663350.2,
4663350.2, 4663350.2, 4663350.2, 4663350.2, 4663350.2), Supplier = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("ACCUEN", "AMAZON (AMZN MED GP)",
"BRIGHTROLL", "BUZZFEED.COM", "CBS DIGITAL MEDIA", "COLLEGE HUMOR",
"COMPLEX.COM", "ESPN.COM", "GOOGLE DISPLAY NTWK", "HULU",
"IGN.COM", "MILLENNIAL MEDIA", "NATIVO.NET", "NBC.COM", "PANDORA MEDIA, INC.",
"REDDIT.COM", "SPOTIFY.COM", "TREMORMEDIA.COM", "TRUEX MEDIA INC.",
"TURNER MEDIA GROUP", "TWITCH.TV", "WIKIA.COM", "WOVENDIGITAL.COM",
"YAHOO! US", "YOUTUBE, LLC."), class = "factor"), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("ACCUEN", "AMAZON.COM", "BLEACHER REPORT",
"BRIGHTROLL", "BUZZFEED.COM", "CBS INTERACTIVE", "COLLEGEHUMOR",
"COMPLEX", "COMPLEX.COM", "ELECTUS", "ESPN.COM", "GOOGLE DISPLAY NTWK",
"HULU", "IGN.COM", "MILLENNIAL MEDIA", "NATIVO", "NBA", "NBCSPORTS.COM",
"PANDORA", "REDDIT", "SPOTIFY", "TREMOR VIDEO", "TRUEX",
"TWITCH", "WIKIA", "WOVEN", "YAHOO", "YOUTUBE, LLC."), class = "factor"),
Flight.start.date = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/3/2015",
"8/31/2015", "9/1/2015", "9/10/2015", "9/11/2015", "9/13/2015",
"9/14/2015", "9/15/2015", "9/16/2015", "9/17/2015", "9/18/2015",
"9/2/2015", "9/20/2015", "9/21/2015", "9/24/2015", "9/25/2015",
"9/27/2015", "9/28/2015", "9/7/2015", "9/9/2015"), class = "factor"),
Flight.end.date = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/3/2015",
"10/4/2015", "9/11/2015", "9/13/2015", "9/15/2015", "9/17/2015",
"9/18/2015", "9/2/2015", "9/20/2015", "9/27/2015", "9/30/2015"
), class = "factor"), Cost.method = structure(c(3L, 3L, 3L,
3L, 3L, 3L), .Label = c("CPC", "CPE", "CPM", "Flat", "Free"
), class = "factor"), Rate = c(43, 15, 5, 125, 25, 15), Planned.unit.amount = c(13750000L,
3000000L, 5000000L, 2200000L, 600000L, 2533333L), Cost = c(55000,
15000, 25000, 33000, 15000, 38000), Excluded = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "No", class = "factor"), Company.ID = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "skrt", class = "factor"),
Person.ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "smgboi", class = "factor"),
Exported.by = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "some guy", class = "factor"),
Exported.on = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2015-10-06 18:53:12, EDT", class = "factor"),
Exported.from = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "media", class = "factor")), .Names = c("site",
"daysinflight", "dailyimpressions", "dailyspend", "Campaign.name",
"Campaign.ID", "startdate", "enddate", "Campaign.budget", "Campaign.planned.cost",
"Campaign.buy.total", "Supplier", "Site", "Placement.name", "Buy.details",
"Positioning", "Unit.dimensions", "Flight.start.date", "Flight.end.date",
"Cost.method", "Rate", "Planned.unit.amount", "Cost", "Excluded",
"Company.ID", "Person.ID", "Exported.by", "Exported.on", "Exported.from"
), row.names = c(NA, 6L), class = "data.frame")

最佳答案

首先,我们需要确保您的日期格式等是正确的。我会假设它们是,我的版本在最后。您尚未提供 dput并且有一些奇怪的列名,所以仔细检查间距和大写。

接下来,让我们构造一个 interval库中每个事件的对象 lubridate :

library(lubridate)
MediaPlanDF$interval <- interval(MediaPlanDF$Campaign.flight.1, MediaPlanDF$end.date)

现在我们可以测试 outputDF 中的每个项目是否在每个区间中,如果是,则将其求和(您的测试数据包含所有区间中的所有元素):
output <- do.call(rbind, lapply(OutputDF$Date, function(x){
index <- x %within% MediaPlanDF$interval;
list(impressions = sum(MediaPlanDF$dailyimpressions[index]),
spend = sum(MediaPlanDF$dailyspend[index]))}))

我们在哪里得到输出:
cbind(output, OutputDF)
impressions spend Date
1 684285.7 3657.143 2015-08-31
2 684285.7 3657.143 2015-09-01
3 684285.7 3657.143 2015-09-02
4 684285.7 3657.143 2015-09-03
5 684285.7 3657.143 2015-09-04
6 684285.7 3657.143 2015-09-05

数据:

输出DF:
structure(list(Date = structure(c(16678, 16679, 16680, 16681, 
16682, 16683), class = "Date")), .Names = "Date", row.names = c(NA,
-6L), class = "data.frame")

媒体计划DF:
structure(list(daysinflight = c(35L, 35L, 35L, 35L), dailyimpressions = c(392857.1429, 
85714.28571, 142857.1429, 62857.14286), dailyspend = c(1571.428571,
428.5714286, 714.2857143, 942.8571429), Campaign.name = structure(c(1L,
1L, 1L, 1L), .Label = "A", class = "factor"), Campaign.ID = structure(c(1L,
1L, 1L, 1L), .Label = "Real", class = "factor"), Campaign.flight = structure(c(1L,
1L, 1L, 1L), .Label = "Advertiser", class = "factor"), start.date = structure(c(1L,
1L, 1L, 1L), .Label = "RAND0M", class = "factor"), Campaign.flight.1 = structure(c(16678,
16678, 16678, 16678), class = "Date"), end.date = structure(c(16712,
16712, 16712, 16712), class = "Date")), .Names = c("daysinflight",
"dailyimpressions", "dailyspend", "Campaign.name", "Campaign.ID",
"Campaign.flight", "start.date", "Campaign.flight.1", "end.date"
), row.names = c(NA, -4L), class = "data.frame")

关于R - 如果日期在范围内,则求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33049441/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com