gpt4 book ai didi

r - 列中逗号分隔类别的频率计数

转载 作者:行者123 更新时间:2023-12-05 02:17:30 24 4
gpt4 key购买 nike

我在 R 中有一个数据框。数据框有多行和多列。其中一列具有不同车辆制造商(如丰田、本田等)的逗号分隔值。我想计算逗号分隔值的频率,并根据每一行的出现频率输出前三个值。这是数据集的样子:

Zip                   Make
12325 Toyota, Honda, Toyota, Mitsubishi, Mercedes
85271 Toyota,Honda,Toyota,Honda,Toyota,Toyota,Volvo,Nissan,Nissan,Nissan, Nissan
56098 Toyota,Honda,Toyota,Mitsubishi,Chevrolet,Acura,Chevrolet,Chevrolet, Honda

这是我想要的输出

Output appended in the image

任何人都可以根据示例帮助编写实际的 R 代码吗?

最佳答案

使用 tidyverse 的解决方案。 dt_final 是最终输出。

library(tidyverse)

# Separate the comma
dt2 <- dt %>% separate_rows(Make)

# Calculate the frequency
dt3 <- dt2 %>% count(Zip, Make)

# Prepare the Frequency column
dt4 <- dt3 %>%
mutate(n = paste0("(", n, ")")) %>%
unite(Frequency, Make, n, sep = " ") %>%
group_by(Zip) %>%
summarise(Frequency = paste0(Frequency, collapse = ", "))

# Prepare the Top 3 Make column
dt5 <- dt3 %>%
group_by(Zip) %>%
mutate(Rank = dense_rank(n)) %>%
filter(Rank <= 3) %>%
arrange(Zip, Rank, Make) %>%
select(Zip, Make) %>%
summarise(`Top 3 Make (per frequency)` = paste0(Make, collapse = ", "))

# Join the results
dt_final <- reduce(list(dt, dt4, dt5), left_join, by = "Zip")

dt_final
# Zip Make
# 1 12325 Toyota, Honda, Toyota, Mitsubishi, Mercedes
# 2 85271 Toyota,Honda,Toyota,Honda,Toyota,Toyota,Volvo,Nissan,Nissan,Nissan, Nissan
# 3 56098 Toyota,Honda,Toyota,Mitsubishi,Chevrolet,Acura,Chevrolet,Chevrolet, Honda
# Frequency
# 1 Honda (1), Mercedes (1), Mitsubishi (1), Toyota (2)
# 2 Honda (2), Nissan (4), Toyota (4), Volvo (1)
# 3 Acura (1), Chevrolet (3), Honda (2), Mitsubishi (1), Toyota (2)
# Top 3 Make (per frequency)
# 1 Honda, Mercedes, Mitsubishi, Toyota
# 2 Volvo, Honda, Nissan, Toyota
# 3 Acura, Mitsubishi, Honda, Toyota, Chevrolet

数据

dt <- read.table(text = "Zip                   Make
12325 'Toyota, Honda, Toyota, Mitsubishi, Mercedes'
85271 'Toyota,Honda,Toyota,Honda,Toyota,Toyota,Volvo,Nissan,Nissan,Nissan, Nissan'
56098 'Toyota,Honda,Toyota,Mitsubishi,Chevrolet,Acura,Chevrolet,Chevrolet, Honda'",
header = TRUE, stringsAsFactors = FALSE)

关于r - 列中逗号分隔类别的频率计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47682889/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com