gpt4 book ai didi

r - 如何将 R 目录中的文件名与 CSV 列中的名称匹配

转载 作者:行者123 更新时间:2023-12-05 04:25:41 26 4
gpt4 key购买 nike

我正在尝试编写一个 r 脚本,它将匹配目录中的文件名并将其与位于 csv 文件中的文件名进行比较。这样我就可以知道已经下载了哪些文件以及我需要下载哪些数据。我编写的代码将从目录中读取文件并将它们列为 df 以及读取 csv 文件。但是,我无法更改文件名以提取我想要的字符串,也无法将文件名与 csv 文件中的名称列匹配。我还想理想地创建一个新的电子表格,它可以告诉我哪些文件匹配,以便我知道已下载的内容。这是我目前所拥有的。

# read files from directory and list as df
file_names <-list.files(path="peaches/",
pattern="jpg",
all.files=TRUE,
full.names=TRUE,
recursive=TRUE) %>%
# turn into df
as.data.frame(x = file_names)

# read in xl file
name_data <- read_excel("peaches/all_data.xlsx")

# change the file_name from the string peaches//fruit/1234/12pink.jpg.txt to -> 12pink
# match the file name with the name column in name_data
# create a new spread sheet that pulls the id and row if it has been downloaded [enter image description here][1]

最佳答案

示例文件/目录

让我们用一些示例文件创建一个示例目录。这将让我们证明该解决方案有效,并且是可重现解决方案的关键。

library(dplyr)
library(writexl)
library(readxl)

# Example directory with example files
dir.create(path = "peaches")
write.csv(data.frame(x = 5), file = "peaches/foo.csv")
write.csv(data.frame(x = 20), file = "peaches/foo.nrrd.csv")
write.csv(data.frame(x = 1), file = "peaches/foo2.nrrd.csv")
write.csv(data.frame(z = 2), file = "peaches/bar.csv")
write.csv(data.frame(z = 5), file = "peaches/bar.rrdr.csv")

# Example Excel file
write_xlsx(data.frame(name = c("foo", "hotdog")),
path = "peaches/all_data.xlsx")

解决方案

我们现在可以使用我们的示例文件和目录来展示问题的解决方案。

# Get file paths in a data.frame for those that contain ".jpg"
# Use data.frame() to avoid row names instead of as.data.frame()
# Need to use \\ to escape the period in the regular expression
file_names <- list.files(
path = "peaches/",
pattern = "\\.jpg",
all.files = TRUE,
full.names = TRUE,
recursive = TRUE
) %>%
data.frame(paths = .)

# Extract part of file name (i.e. removing directory substrings) that
# comes before .nrrd and add a column. Can get file name with basename()
# and use regular expressions for the other part.
file_names$match_string <- file_names %>%
pull(paths) %>%
basename() %>%
gsub(pattern = "\\.jpg.*", replacement = "")

file_names$match_string
#> [1] "foo" "foo2"

# Read in excel file with file names to match (if possible)
name_data <- read_excel("peaches/all_data.xlsx")

name_data$name
#> [1] "foo" "hotdog"

# Create match indicator and row number
name_data <- name_data %>%
mutate(
matched = case_when(name %in% file_names$match_string ~ 1,
TRUE ~ 0),
rowID = row_number()
)

# Create excel spreadsheet of files already downloaded
name_data %>%
filter(matched == 1) %>%
write_xlsx(path = "peaches/already_downloaded.xlsx")

关于r - 如何将 R 目录中的文件名与 CSV 列中的名称匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73179799/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com