gpt4 book ai didi

r - 将结构化文本文件但非标准结构转换为R中的数据帧

转载 作者:行者123 更新时间:2023-12-04 14:31:39 25 4
gpt4 key购买 nike

我是 R 的新手。我正在尝试学习基本的数据 I/O 和预处理。我有一个格式如下的文本文件。它是一种非标准格式(不同于 CSV、JSON 等)我需要将以下结构转换为类似表格的格式(更准确地说是我们从 csv 文件中获取的数据框)

输入

product/productId: B000H13270
review/userId: A3J6I70Z9Q0HRX
review/profileName: Lindey H. Magee
review/helpfulness: 1/3
review/score: 5.0
review/time: 1261785600
review/summary: it's fabulous, but *not* from amazon!
review/text: the price on this product certainly raises my attention on compairing amazon price with the local stores. i can get a can of this rotel at my local kroger for $1. dissapointing!

product/productId: B000H13270
review/userId: A1YLOZQKBX3J1S
review/profileName: R. Lee Dailey "Lee_Dailey"
review/helpfulness: 1/4
review/score: 3.0
review/time: 1221177600
review/summary: too expensive
review/text: howdy y'all,<br /><br />the actual product is VERY good - i'd rate the item a 4 on it's own. however, it's only ONE dollar at the local grocery and - @ twenty eight+ dollars per twelve pack - these are running almost two and a half dollars each.<br /><br />as i said, TOO EXPENSIVE. [*sigh ...*] i was really hoping to get them at something approaching the local cost.<br /><br />take care,<br />lee

输出

product/productId | review/UserId ......... | review/text
B000H13270 |A3J6I70Z9Q0HRX | the price on this .... dissapointing!
B000H13270 | A1YLOZQKBX3J1S |howdy y'all,<br /> ..... lee

Python 中,我可以按以下方式执行相同的操作

dataFile = open('filename').read().split('\n') # obtain each data chunk
revDict = dict()
for item in dataFile:
stuff = item.split(':')
revDict[stuff[0]].append(stuff[1])

如何在 R 中实现类似的功能。 R

中是否有任何等价物

最佳答案

这是一种快速而肮脏的方法,它按冒号拆分(每行中除第一个冒号外的所有冒号都从文件中删除),然后将数据从长改成宽:

mytxt <- readLines(file("mytext.txt"))
mytable <- read.table(text=gsub("^([^:]*:)|:", "\\1", mytxt), sep = ":", quote = "")
mytable$id <- rep(1:(nrow(mytable)/8), each = 8)
res <- reshape(mytable, direction = "wide", timevar = "V1", idvar = "id")

给出:

  id V2.product/productId V2.review/userId           V2.review/profileName  V2.review/helpfulness V2.review/score V2.review/time                      V2.review/summary                                                                                                                                                                                                                                                                                                                                                                                                    V2.review/text
1 1 B000H13270 A3J6I70Z9Q0HRX Lindey H. Magee 1/3 5.0 1261785600 it's fabulous, but *not* from amazon! the price on this product certainly raises my attention on compairing amazon price with the local stores. i can get a can of this rotel at my local kroger for $1. dissapointing!
9 2 B000H13270 A1YLOZQKBX3J1S R. Lee Dailey \\"Lee_Dailey\\" 1/4 3.0 1221177600 too expensive howdy y'all,<br /><br />the actual product is VERY good - i'd rate the item a 4 on it's own. however, it's only ONE dollar at the local grocery and - @ twenty eight+ dollars per twelve pack - these are running almost two and a half dollars each.<br /><br />as i said, TOO EXPENSIVE. [*sigh ...*] i was really hoping to get them at something approaching the local cost.<br /><br />take care,<br />lee

假设每个案例由 8 行组成。

关于r - 将结构化文本文件但非标准结构转换为R中的数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32752337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com