gpt4 book ai didi

r - 如何根据r中的日期/天拆分和制作新的csv文件?

转载 作者:行者123 更新时间:2023-12-03 21:57:20 27 4
gpt4 key购买 nike

嗨,我有一个 8GB 的​​文件,我需要对其进行一些分析。但是我的 RAM 不是很好。为了有效地工作,我决定使用以下代码根据行拆分我的 csv 文件:

library(tidyverse)

sample_df <- readr::read_csv("sample.csv") #Read in the csv file
dput(sample_df)

#break the large CSV so RAM and Rstudio doesn't crash

groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached

for (i in seq_along(groups)) {
write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
}

这非常有效,直到我的高级导师要求我根据每个日期/天进行分析。我遇到了一个问题,因为通过按行拆分,我最终将日期传播到多个 csv。当我尝试读取 3-4 个 csv 以进行基于每天的分析时,这会产生 RAM 和内存管理不足的问题。

示例文件在这里: https://github.com/THsTestingGround/SO_splitbydate_question/blob/master/sample.csv

那么有人可以帮助我如何根据日期拆分我最初读取的以下示例 csv 文件吗?我想将所有 Aprl1 放在一个 csv 文件中,然后将 Aprl2 放在另一个文件中,依此类推。我确实尝试过,但我无法成功。

另外我想知道 readr::read_csv_chunked 是否可以以任何方式帮助我们?从文档中我看不到任何具体内容。

这是 csv 文件的 dput:

dput(sample_df)
structure(list(createdAt = c("Fri Apr 01 04:04:32 +0000 2020",
"Fri Apr 01 04:04:36 +0000 2020", "Fri Apr 01 04:04:37 +0000 2020",
"Fri Apr 02 04:04:40 +0000 2020", "Fri Apr 02 04:04:44 +0000 2020",
"Fri Apr 02 04:04:46 +0000 2020", "Fri Apr 02 04:04:54 +0000 2020",
"Fri Apr 02 04:04:56 +0000 2020", "Fri Apr 02 04:05:07 +0000 2020",
"Fri Apr 02 04:05:12 +0000 2020", "Fri Apr 03 04:05:12 +0000 2020",
"Fri Apr 03 04:05:19 +0000 2020", "Fri Apr 03 04:05:27 +0000 2020",
"Fri Apr 03 04:05:33 +0000 2020", "Fri Apr 03 04:05:36 +0000 2020",
"Fri Apr 03 04:06:11 +0000 2020", "Fri Apr 03 04:07:08 +0000 2020",
"Fri Apr 03 04:07:14 +0000 2020", "Fri Apr 03 04:07:15 +0000 2020",
"Fri Apr 03 04:07:20 +0000 2020", "Fri Apr 03 04:07:30 +0000 2020",
"Fri Apr 03 04:07:51 +0000 2020", "Fri Apr 03 04:08:04 +0000 2020",
"Fri Apr 03 04:08:09 +0000 2020", "Fri Apr 03 04:08:15 +0000 2020",
"Fri Apr 03 04:08:22 +0000 2020", "Fri Apr 03 04:08:36 +0000 2020",
"Fri Apr 03 04:08:46 +0000 2020", "Fri Apr 03 04:08:46 +0000 2020",
"Fri Apr 03 04:09:01 +0000 2020", "Fri Apr 03 04:09:08 +0000 2020",
"Fri Apr 03 04:09:10 +0000 2020", "Fri Apr 03 04:09:15 +0000 2020",
"Fri Apr 03 04:09:26 +0000 2020", "Fri Apr 03 04:09:27 +0000 2020",
"Fri Apr 03 04:09:28 +0000 2020", "Fri Apr 03 04:09:28 +0000 2020",
"Fri Apr 03 04:09:35 +0000 2020", "Fri Apr 03 04:09:36 +0000 2020",
"Fri Apr 03 04:09:41 +0000 2020", "Fri Apr 03 04:09:45 +0000 2020",
"Fri Apr 03 04:10:16 +0000 2020", "Fri Apr 03 04:10:19 +0000 2020",
"Fri Apr 03 04:10:22 +0000 2020", "Fri Apr 03 04:10:26 +0000 2020",
"Fri Apr 03 04:10:31 +0000 2020", "Fri Apr 03 04:10:48 +0000 2020",
"Fri Apr 04 04:11:19 +0000 2020", "Fri Apr 04 04:11:32 +0000 2020",
"Fri Apr 04:11:44 +0000 2020"), timestamp = c(1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12), id_str = c(1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.25e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18), text = c("Finally. Make your own mask. Protect yourself and others. #coronavirus",
"@ArvinderSoin do you feel the use of only masks for IPD rounds, in an environment where no patients have been teste…",
"India, you actually deserve him for electing him.\n\nAb batti bhujao aur #corona bhagav.\n\nNo testing kits, no masks,…",
"great picture to sum up everything\n#mask #maskefficiency #noclothmask #maskprotection #surgicalmask #N95 #FFP1…",
"The greatest hazard to public health is official misinformation.\n\nAsian countries were wearing masks from the begin…",
"#Florida official says @3M is selling face masks to foreign countries instead of his state amid #COVID19 crisis.\n",
"Wearing masks is one of the protective measures preventing catching the novel #Coronavirus as the pandemic spreads…",
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…",
"#coronavirus watching me put on the same surgical mask 2 shifts in a row\n\n#COVID<U+30FC>19 #nurse",
"Back in stock! NIOSH N95, go to our website.\nOnly 11,000 masks \n\n#facemask #facemasks #N95…",
"Hence the vital importance of wearing masks when outside - #coronavirus #coronavirusindia #COVID2019india…",
"@Read5000YrLeap @SenSchumer buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M… ",
"When going out for essential activities, members of the public should wear reusable, non-medical cloth face coverin…",
"@jmcmaccarr buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M @seanhannity…",
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…",
"@CNN Just #WearMask People wearing a mask Nationwide ... SAVES…",
"That is less than 4 million per week. In Taiwan, everyone is allocated 3 surgical masks per week. For Australia t…",
"@Constitution999 @ChuckCallesto @realDonaldTrump buy trump facemasks. support trump 2020 and be safe. ships from mi…",
"Regard the debate of face mask in general public, the evidence of effectiveness is quite clear #Covid19…",
"Normalize putting on of masks. #COVID19 came to change the world order.",
"@TwitterSafety the Honduran gov’t is lying on Twitter. Saying that they are making thousands of masks, protective v…",
"Trump explaining that if you need a mask you can go to Walmart. Also that Costco has some great deals on caskets an…",
"When lockdown is over... I just may add this to my “don’t forget..” along with my wallet, gloves, mask, hand saniti…",
"Make your own mask: #covid19\n", "Please, everyone should wear a mask in public. Use whatever you can get hold of. Something is better than nothing (…",
"@kittywuv1 So incredibly mesmerizing, even with the custom #covid19 mask!<U+0001F970><U+0001F60D><U+0001F618><U+0001F637><U+0001F497>",
"@BeauTFC Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (surgi…",
"On a lighter note. \n\nIt is questionable if these common surgical masks and cloth masks will protect us from…",
"Medical workers face big mask shortage. This UF doctor came up with way to make many \n\n…",
"Homemade face coverings. Well, I tried it didn't come out straight but it should work. <U+0001F637> #homemade #facecoverings…",
"#covid19 In Africa, \"where are no masks, no treatment, no reanimation\", \"the same way experimental treatment for AI…",
"@theblondeMD Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (s…",
"I wouldn’t do a thing anyone from #China says to do. The masks they keep sending around the world are faulty, they…",
"@TIME [covid19],important:\n1.from_air-&gt;mask-&gt;mask_reuse.\n2.from_touch-&gt;clean_hands.\n\nps1.20200328.…",
"@3M stop selling masks to foreign companies. We WILL remember this!\n#COVID19Pandemic \n#covid19\n#N95masks",
"Awareness for using mask by @WHO #recommendations @CMOTamilNadu #COVID19 #Corona @MoHFW_INDIA #TNHealth #CVB…",
"@Rakshitwa @beingdumber @taapsee Nitish Kumar asked for 10 lakh N95 masks but got 50,000. Sought five lakh PPE kits…",
"@CNN You mean the masks everyone was saying #Covit19 #COVID<U+30FC>19 #coronavirus can pass right through as per what was…",
"2 BILLION masks = global production capacity in 2.5 MONTHS = quantity of what China imported in 5 WEEKS since Jan…",
"@CDCgov @CDCDirector @SF_DPH Please remember those with #COPD #LungDisease #HeartDisease when requiring #masks for…",
"If you have to go out and can’t avoid being around people, wear a mask. Masks are a complement to social distancin…",
"@CTVVancouver According to Dr \"doom\" Bonnie Henry, masks aren't of any use to the general public, in fact, she clai…",
"@maddow Next time you talk about the government stating everyone needs to wear a mask ask a government official whe…",
"Wear a mask in you are unwell or taking care of a person with suspected 2019-nCoV infection.\nInfo source: WHO…",
"7/9 For those who need a #COVID19 mask ASAP and have no talent, time or materials to make a mask. We give you the e…",
"jasminesade_art\nIs taking orders for masks (w/ filter pocket) \nMsg jasminesade_art if interested <U+0001F496> \n.\n.\n.\n.\n.\n.",
"What China do to cut down the spread dramatically are only to make people stay at home and wear masks!!!!!@PHE_uk…",
"@CNN hey i thought we were boycotting China\nthen why the Americans need Chinese masks?\ngo fuck yourself \n#BoycottChina #coronavirus",
"@CNN @CillizzaCNN [covid19],important:\n1.from_air-&gt;mask-&gt;mask_reuse.\n2.from_touch-&gt;clean_hands.\n\nps1.20200328.…",
"@kr3at #WearMask Everyone !!!\n\n\nSimply wearing a mask Nationwide ... SAVES #CZECHOSLOVAKIA…"
), retweetCount = c(1372, 9, NA, 8, 30, NA, NA, NA, NA, NA, 34,
NA, NA, NA, NA, NA, 192, NA, NA, NA, 50, NA, 221, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 17, 1948, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 53, NA, 1948, NA), favorite_count = c(3488,
23, NA, 7, 46, NA, NA, NA, NA, NA, 62, NA, NA, NA, NA, NA, 710,
NA, NA, NA, 48, NA, 506, NA, NA, NA, NA, NA, NA, NA, NA, NA,
29, 4963, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 164,
NA, 4963, NA), url = c("twitter.com/33617860/status/1245925124483809280",
"twitter.com/1106803026/status/1245925141046935552", "twitter.com/421517829/status/1245925143479595008",
"twitter.com/1245594213795778560/status/1245925159724171264",
"twitter.com/2178012643/status/1245925173858975744", "twitter.com/1220529001241989120/status/1245925183010963456",
"twitter.com/1115874631/status/1245925217790124032", "twitter.com/1243781317747077120/status/1245925225327235072",
"twitter.com/2729830110/status/1245925273230438400", "twitter.com/1240114893178667008/status/1245925291374964736",
"twitter.com/88875512/status/1245925292972969984", "twitter.com/1245907384993812480/status/1245925320282136576",
"twitter.com/3431854829/status/1245925357116481536", "twitter.com/1245907384993812480/status/1245925380973871104",
"twitter.com/1243781317747077120/status/1245925393095217152",
"twitter.com/1230706447257751552/status/1245925541644992512",
"twitter.com/4437322348/status/1245925779117985792", "twitter.com/1245907384993812480/status/1245925802442555392",
"twitter.com/829633267942903808/status/1245925807211663360",
"twitter.com/403961389/status/1245925829755969536", "twitter.com/17183161/status/1245925869010292736",
"twitter.com/1408320152/status/1245925960550993920", "twitter.com/1245663286881902592/status/1245926011679600640",
"twitter.com/244306637/status/1245926036321103872", "twitter.com/24327965/status/1245926059318448128",
"twitter.com/1164222471639318528/status/1245926089068646400",
"twitter.com/16328861/status/1245926148967727104", "twitter.com/6125082/status/1.24592618943e+18",
"twitter.com/3685052935/status/1245926191850065920", "twitter.com/868528766355558400/status/1245926251455365120",
"twitter.com/1223273206636851200/status/1245926283093012480",
"twitter.com/16328861/status/1245926292274311168", "twitter.com/1160039103905390592/status/1245926310670565376",
"twitter.com/1236738668905127936/status/1245926356468162560",
"twitter.com/400431217/status/1245926363833532416", "twitter.com/1244269086088945664/status/1245926365116809216",
"twitter.com/850227053139853312/status/1245926366781902848",
"twitter.com/244314850/status/1245926393822605312", "twitter.com/1244446404178665472/status/1245926398578978816",
"twitter.com/3184694718/status/1245926421601509376", "twitter.com/82208845/status/1245926438143807488",
"twitter.com/1216588869530836992/status/1245926569303891968",
"twitter.com/4770303330/status/1245926579936432128", "twitter.com/1245580876047499264/status/1245926591806361600",
"twitter.com/904740870817120256/status/1245926610181574656",
"twitter.com/934146138/status/1245926629022433280", "twitter.com/1223547711468777472/status/1245926703257366528",
"twitter.com/840838036707393536/status/1245926832618131456",
"twitter.com/1236738668905127936/status/1245926888087773184",
"twitter.com/1230706447257751552/status/1245926935042994176"),
friendCount = c(1018, 326, 1205, 48, 3690, 1584, 55, 42,
580, 11, 3610, 13, 110, 13, 42, 382, 43, 13, 106, 4195, 599,
8, 89, 414, 280, 931, 5001, 1602, 1327, 227, 310, 5001, 26,
65, 2371, 31, 523, 228, 8, 671, 499, 1324, 333, 5, 852, 5457,
7, 48, 65, 382), screenNames = c("DayssiOK", "DrAmbrishMithal",
"LuvAminaKausar", "Sunnie09370280", "balajis", "World_In_Mins",
"CGTNOfficial", "a7BdaSSeyL4czNw", "ShellBell915", "remedair",
"RitasArtCafe", "trumpfacemasks", "SCC_OES", "trumpfacemasks",
"a7BdaSSeyL4czNw", "REX38225222", "e2p71828", "trumpfacemasks",
"lamsonlinshen", "SteveJumaaa", "patfloTO", "tenforadollar",
"sashir_milne", "rdesai711", "agrothey", "foreskinjim1",
"rover223", "scanman", "AlDubest2Evry1", "HurtadoMarleen",
"johnmik63542947", "rover223", "CowlSolomon", "spacetinyearth",
"jmegown52302", "DrPonnarasu", "pankajupa120", "JoaoNewman",
"LalalaHK1", "SaturniaC", "NYCMediaMix", "ToscasReturn",
"JamesDallas9175", "cornzal", "CEDRdigital", "NadraRae",
"SiluMa4", "1Wa49R41L3pVzQj", "spacetinyearth", "REX38225222"
), userID = c(33617860, 1106803026, 421517829, 1.24559e+18,
2178012643, 1.22e+18, 1115874631, 1.24e+18, 2729830110, 1.24e+18,
88875512, 1.24591e+18, 3431854829, 1.24591e+18, 1.24e+18,
1.23071e+18, 4437322348, 1.24591e+18, 8.29633e+17, 403961389,
17183161, 1408320152, 1.24566e+18, 244306637, 24327965, 1.16422e+18,
16328861, 6125082, 3685052935, 8.68529e+17, 1.22327e+18,
16328861, 1.16004e+18, 1.24e+18, 400431217, 1.24427e+18,
8.50227e+17, 244314850, 1.24445e+18, 3184694718, 82208845,
1.22e+18, 4770303330, 1.24558e+18, 9.04741e+17, 934146138,
1.22355e+18, 8.40838e+17, 1.24e+18, 1.23071e+18), language = c("en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en"), replyToScreenName = c("None",
"ArvinderSoin", "None", "None", "None", "World_In_Mins",
"None", "None", "None", "None", "None", "Read5000YrLeap",
"None", "jmcmaccarr", "None", "CNN", "None", "Constitution999",
"None", "None", "TwitterSafety", "None", "None", "None",
"None", "kittywuv1", "BeauTFC", "None", "None", "None", "None",
"theblondeMD", "None", "TIME", "3M", "None", "Rakshitwa",
"CNN", "None", "CDCgov", "None", "CTVVancouver", "maddow",
"None", "CEDRdigital", "None", "None", "CNN", "CNN", "kr3at"
), replyToID = c("None", "1.13442E+18", "None", "None", "None",
"1.22053E+18", "None", "None", "None", "None", "None", "154243839",
"None", "48150879", "None", "759251", "None", "1.04747E+18",
"None", "None", "95731075", "None", "None", "None", "None",
"1.21653E+18", "1.05676E+18", "None", "None", "None", "None",
"230792524", "None", "14293310", "378197959", "None", "9.81585E+17",
"759251", "None", "146569971", "None", "16313405", "16129920",
"None", "9.04741E+17", "None", "None", "759251", "759251",
"139283160"), retweetUserScreenName = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), retweetUserID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), followersCount = c(1452,
3844, 2398, 1, 179896, 1283, 14036740, 24, 329, 3, 7133,
2, 1050, 2, 24, 121, 4, 2, 38, 2533, 235, 2, 5, 148, 2312,
265, 1572, 8067, 1265, 167, 13, 1574, 1, 2, 972, 1, 107,
7, 0, 73, 295, 1160, 849, 1, 7519, 1749, 0, 4, 2, 121), userMentions = c(NA,
"ArvinderSoin", NA, NA, NA, "3M", NA, NA, NA, NA, NA, "Read5000YrLeap",
NA, "jmcmaccarr", NA, "CNN", NA, "Constitution999", NA, NA,
"TwitterSafety", NA, NA, NA, NA, "kittywuv1", "BeauTFC",
NA, NA, NA, NA, "theblondeMD", NA, "TIME", "3M", "WHO", "Rakshitwa",
"CNN", NA, "CDCgov", NA, "CTVVancouver", "maddow", NA, NA,
NA, NA, "CNN", "CNN", "kr3at"), userMentionsID = c(NA, 1.13442e+18,
NA, NA, NA, 378197959, NA, NA, NA, NA, NA, 154243839, NA,
48150879, NA, 759251, NA, 1.05e+18, NA, NA, 95731075, NA,
NA, NA, NA, 1.21653e+18, 1.05676e+18, NA, NA, NA, NA, 230792524,
NA, 14293310, 378197959, 14499829, 9.81585e+17, 759251, NA,
146569971, NA, 16313405, 16129920, NA, NA, NA, NA, 759251,
759251, 139283160), hashtag1 = c("coronavirus", NA, "corona",
"mask", NA, "Florida", "Coronavirus", NA, "coronavirus",
"facemask", "coronavirus", "Boycott3M", NA, "Boycott3M",
NA, "WearMask", NA, NA, "Covid19", "COVID19", NA, NA, NA,
"covid19", NA, "covid19", NA, NA, NA, "homemade", "covid19",
NA, "China", NA, "COVID19Pandemic", "recommendations", NA,
"Covit19", NA, "COPD", NA, NA, NA, NA, "COVID19", NA, NA,
"BoycottChina", NA, "WearMask"), hashtag2 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), mediatype = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), mediaURL = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -50L), spec = structure(list(
cols = list(createdAt = structure(list(), class = c("collector_character",
"collector")), timestamp = structure(list(), class = c("collector_double",
"collector")), id_str = structure(list(), class = c("collector_double",
"collector")), text = structure(list(), class = c("collector_character",
"collector")), retweetCount = structure(list(), class = c("collector_double",
"collector")), favorite_count = structure(list(), class = c("collector_double",
"collector")), url = structure(list(), class = c("collector_character",
"collector")), friendCount = structure(list(), class = c("collector_double",
"collector")), screenNames = structure(list(), class = c("collector_character",
"collector")), userID = structure(list(), class = c("collector_double",
"collector")), language = structure(list(), class = c("collector_character",
"collector")), replyToScreenName = structure(list(), class = c("collector_character",
"collector")), replyToID = structure(list(), class = c("collector_character",
"collector")), retweetUserScreenName = structure(list(), class = c("collector_logical",
"collector")), retweetUserID = structure(list(), class = c("collector_logical",
"collector")), followersCount = structure(list(), class = c("collector_double",
"collector")), userMentions = structure(list(), class = c("collector_character",
"collector")), userMentionsID = structure(list(), class = c("collector_double",
"collector")), hashtag1 = structure(list(), class = c("collector_character",
"collector")), hashtag2 = structure(list(), class = c("collector_logical",
"collector")), mediatype = structure(list(), class = c("collector_logical",
"collector")), mediaURL = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
> groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached
> for (i in seq_along(groups)) {
+ write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
+ }

最佳答案

我们可以从 createdAt 创建一个变量然后做 group_splitlist数据框。在这里,我们可以使用 str_replace 提取特定的子串。通过删除第一个单词后跟空格,同时捕获下一个单词、空格、一些数字并在替换中使用它们。

library(dplyr)
library(stringr)
sample_df %>%
mutate(month_day = str_replace(createdAt,
"^\\w+\\s+(\\w+\\s+\\d+).*", "\\1")) %>%
group_split(month_day)

注意:不需要 mutatemonth_day可以在 group_split 中即时创建本身

关于r - 如何根据r中的日期/天拆分和制作新的csv文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61309591/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com