gpt4 book ai didi

r - 如何从具有不同维度的两个数据框中提取特定行并生成多个 .csv 文件?

转载 作者:行者123 更新时间:2023-12-04 11:56:24 25 4
gpt4 key购买 nike

数据框一。

  structure(list(trial_id = c(2022L, 2023L, 2123L, 2184L, 3883L, 
4434L), ctri_number = c("CTRI/2018/02/011794 ", "CTRI/2017/08/009517 ",
"CTRI/2019/05/019036 ", "CTRI/2017/12/010935 ", "CTRI/2017/09/009746 ",
"CTRI/2016/06/007055 "), name = c("National Institute of Allergy and Infectious Diseases NIAIDMaryland USA",
"Jawaharlal Nehru Medical College", "KLEU Ayurveda Pharmacy",
"Amgen Inc", "Dr Arunkumar", "ALVAS EDUCATION FOUNDATION"), type_of_sponsor = c("' Government funding agency '",
"' Government medical college '", "' Research institution '",
"' Pharmaceutical industry-Global '", " Other [Self sponsored] '",
"' Private hospital/clinic '"), address = c("' USA '", "' Jawaharlal Nehru Medical College, Aligarh Muslim University, Aligarh-202001 '",
"' KLEU Ayurveda Pharmacy, Khasbhag, Belgaum, Karnataka '", "' One Amgen Center Drive\n\n\nThousand Oaks, CA USA\n\n\n91320 '",
"' Room no 32 ,Department of Periodontics , Government Dental college , Trivandrum '",
"' ALVAS EDUCATION FOUNDATION ALVAS COLLEGE OF PHYSIOTHERAPY\n\n\nMoodabidri - 574227\n\n\nSouth Canara District\n\n\nKarnataka '"
)), row.names = c(NA, 6L), class = "data.frame")
数据框二。
    structure(list(distinctOrganizations = c("A AMMU", "A and U tibbia college and hospital", 
"A Arumuga kani", "A KIREETI", "AAMIR ZUBAIR SHAIKH", "Aansu Susan Varghese"
)), row.names = c(NA, 6L), class = "data.frame")
使用数据框 2(distinctOrganizations) 中的所有数据字段,我必须从数据框 1 中提取与名称列中的值匹配的行。
但是,每个数据字段都应生成一个特定的 .csv 文件。
我怎样才能做到这一点?

可能的结果 - 类似于图像的 CSV 文件。
The image is of CSV file which contains all the rows related to AIIMS and its variants only. I need CSV file different for each such name.

最佳答案

首先:您的示例数据与任何行都不匹配( df2 不提供您的示例 df1 中包含的任何名称)。
如果我问对了你的问题,你可以使用

library(dplyr)
library(purrr)
library(readr)

df1 %>%
inner_join(df2, by = c("name" = "distinctOrganizations")) %>%
split(f = .$name) %>%
walk(~write_csv(.x, paste0(unique(.x$name), ".csv")))
  • 我们使用 inner_joindf1 中删除所有元素在 df2 中没有匹配项的
  • 然后我们split按名称生成的 data.frame,为每个(不同的)组织创建一个新的 data.frame
  • 最后我们使用 purrrwalk编写 .csv 的函数-这些组织中的每一个的文件。这会产生 .csv -文件如 Amgen Inc.csvALVAS EDUCATION FOUNDATION.csv .

  • 注: address列包含一些换行符 ( \n )。您应该考虑删除它们,它们可能会给您的 .csv 带来麻烦。并在接下来的步骤中处理这些问题。列 type_of_sponsor 中也有一些空格(在开头和结尾)您可能想要删除。
    enter image description here
    数据
    我修改了 df2获得两场比赛:
    df2 <- structure(list(distinctOrganizations = c("Amgen Inc", "A and U tibbia college and hospital", 
    "ALVAS EDUCATION FOUNDATION", "A KIREETI", "AAMIR ZUBAIR SHAIKH",
    "Aansu Susan Varghese")), row.names = c(NA, 6L), class = "data.frame")

    关于r - 如何从具有不同维度的两个数据框中提取特定行并生成多个 .csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69386568/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com