- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
我需要将此文件解析为一个配置单元表,该表是来自亚马逊的电影评论数据集。我在构建正则表达式以解析 .txt 文件并创建具有正确列类型的表时遇到问题。
.txt
product/productId: B0001G6PZC
review/userId: A3F3THLLZXURQN
review/profileName: A. Y
review/helpfulness: 3/3
review/score: 4.0
review/time: 1199664000
review/summary: Good story, Good action. Good Drama. Good Movie
review/text: When I first heard of this movie, I didn't think it would be that great, so I never bothered to go see it in theaters. Later on, I ended up downloading the movie, and didn't think much of it.<br /><br />But now after watching the movie on BD, I think that the movie is quite outstanding. Its got a good story behind it, with some level of historical basis behind it with Samurai becoming phased out into Japan's modernization.<br /><br />It does a good job in immersing you into the conflicts that warriors must endure... and yet, find peace with the way of the Samurai as they are a warrior race and not savages.<br /><br />4/5 stars.
product/productId: B0001G6PZC
review/userId: A3J78KAIPW6KAH
review/profileName: Joan Paolo De Bastos "conde_almasy"
review/helpfulness: 3/3
review/score: 4.0
review/time: 1198540800
review/summary: Good Movie. Wonderful Visuals. A Great Way to SHOW OFF you Hi-Def System
review/text: Last Samurai is no masterpiece<br /><br />but technically it is<br /><br />the visuals, the sound effects, the music.<br /><br />If you want to show off to your friends what a great hi-def system you got, purchase this movie.<br /><br />If you want a classic, but lord of the rings or gone with the wind instead.
product/productId: B0001G6PZC
review/userId: A3F3B6HY9RJI04
review/profileName: James Duckett
review/helpfulness: 3/3
review/score: 5.0
review/time: 1192060800
review/summary: Great Movie, Fantastic HD Quality
review/text: After picking up my HD DVD player I've had troubles watching regular DVD movies. I had heard some good things about this movie but couldn't pass it up once it was in high definition.<br /><br />The story is pretty good. This is the story of Captain Algren who has been sent to Japan in the late 1800's in order to help them modernize the Japanese army as they go from fighting with swords and arrows to machine guns and cannons.<br /><br />After the "modern" Japanese army prematurely attacks the Samurai and lose horribly, Captain Algren is taken captive by the Samurai and introduced to their way of life and refusal to lay down the sword in the name of compliance. In time, Captain Algren finds himself wanting to become one of the Samurai and learning more of their way of life.<br /><br />The story is pretty good but what raises this up to the level of being outstanding is the high definition quality of the movie. It was fantastic, especially seeing the colorful Japanese landscape in all of its magnificence.<br /><br />If you like Tom Cruise action movies, this is one to pick up especially in high definition (whether it be Blu-Ray or HD DVD). The violence can be extremely graphic (hey, this is war) so if you are sensitive to that you may want to look for something else. Otherwise, the pacing of the movie is pretty good. It isn't an all out gore-fest... there is action and then it breaks and lets you relax and catch up a little bit and then goes back to action and so on and so forth.
这是我的 SQL:
CREATE EXTERNAL TABLE movies(id string, uId string, profileName string, helpfulness string, score float, time int, summary string, text string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH serdeproperties( "input.regex" = "[ ].*", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s"")
location '/user/hduser/moviesTest';
但是配置单元没有正确解析它并且:SELECT * FROM movies
给我这个结果:
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
谁能告诉我我做错了什么?
最佳答案
这可以通过 Hive UDF 轻松完成;
你的数据在表 suppose temp 中,单列命名为 line;
create table temp(line String);
load data local inpath 'review.txt' into table temp;
select line from temp;
roduct/productId: B0001G6PZC
review/userId: A3F3THLLZXURQN
review/profileName: A. Y
review/helpfulness: 3/3
review/score: 4.0
review/time: 1199664000
review/summary: Good story, Good action. Good Drama. Good Movie
review/text: When I first heard of this movie, I didn't think it would be that great, so I never bothered to go see it in theaters. Later on, I ended up downloading the movie, and didn't think much of it.<br /><br />But now after watching the movie on BD, I think that the movie is quite outstanding. Its got a good story behind it, with some level of historical basis behind it with Samurai becoming phased out into Japan's modernization.<br /><br />It does a good job in immersing you into the conflicts that warriors must endure... and yet, find peace with the way of the Samurai as they are a warrior race and not savages.<br /><br />4/5 stars.
product/productId: B0001G6PZC
review/userId: A3J78KAIPW6KAH
review/profileName: Joan Paolo De Bastos "conde_almasy"
review/helpfulness: 3/3
review/score: 4.0
review/time: 1198540800
............
............
在java中创建一个Hive Udf。来源在这里
package HiveUDF;
import org.apache.hadoop.hive.ql.exec.UDF;
public class ReviewDataUdf extends UDF {
String s = " ";
String structuredFormat;
int inds = 0;
int inde = 0;
public String evaluate(String t) {
s = s + " " + t;
if (t.contains("review/text:")) {
String productId = "";
try {
if (s.contains("product/productId:")) {
inds = s.indexOf("product/productId:");
inde = s.indexOf("review/userId:", inds);
productId = s.substring(inds + 18, inde);
} else {
productId = "N/A";
}
} catch (Exception e) {
productId = "";
}
String userId = "";
try {
if (s.contains("review/userId:")) {
inds = s.indexOf("review/userId:");
inde = s.indexOf("review/profileName:", inds);
userId = s.substring(inds + 14, inde);
} else {
userId = "N/A";
}
} catch (Exception e) {
userId = "";
}
String profileName = "";
try {
if (s.contains("review/profileName:")) {
inds = s.indexOf("review/profileName:");
inde = s.indexOf("review/helpfulness:", inds);
profileName = s.substring(inds + 19, inde);
} else {
profileName = "N/A";
}
} catch (Exception e) {
profileName = "";
}
String helpfulness = "";
try {
if (s.contains("review/helpfulness:")) {
inds = s.indexOf("review/helpfulness:");
inde = s.indexOf("review/score:", inds);
helpfulness = s.substring(inds + 20, inde);
} else {
helpfulness = "N/A";
}
} catch (Exception e) {
helpfulness = "";
}
String score = "";
try {
if (s.contains("review/score:")) {
inds = s.indexOf("review/score:");
inde = s.indexOf("review/time:", inds);
score = s.substring(inds + 14, inde);
} else {
score = "N/A";
}
} catch (Exception e) {
score = "";
}
String time = "";
try {
if (s.contains("review/time:")) {
inds = s.indexOf("review/time:");
inde = s.indexOf("review/summary:", inds);
time = s.substring(inds + 14, inde);
} else {
time = "N/A";
}
} catch (Exception e) {
time = "";
}
String summary = "";
try {
if (s.contains("review/summary:")) {
inds = s.indexOf("review/summary:");
inde = s.indexOf("review/text:", inds);
summary = s.substring(inds + 16, inde);
} else {
summary = "N/A";
}
} catch (Exception e) {
summary = "";
}
String text = "";
try {
if (s.contains("review/text:")) {
inds = s.indexOf("review/text:");
inde = s.indexOf(s.length(), inds);
text = s.substring(inds + 14);
} else {
text = "N/A";
}
} catch (Exception e) {
text = "";
}
structuredFormat = productId + "\t" + userId + "\t" + profileName + "\t" + helpfulness + "\t" + score
+ "\t" + time + "\t" + summary + "\t" + text;
s = "";
return structuredFormat.trim();
} else {
return null;
}
}
}
导出ReviewDataUdf.jar,在hive中注册并创建函数。
hive> ADD JAR /home/Kishore/ReviewDataUdf.jar;
hive> create temporary FUNCTION structReview as 'HiveUDF.ReviewDataUdf';
使用structReview函数获取结构化数据。
Create table AmazonReview as
select split(review,"\t")[0] as productId, split(review,"\t")[1] as userId, split(review,"\t")[2] as profileName,split(review,"\t")[3] as helpfulness, split(review,"\t")[4] as score,split(review,"\t")[5] as time,split(review,"\t")[6] as summary,split(review,"\t")[7] as text from(
select structReview(line) As review from temp ) b
where review != "NULL";
数据在 AmazonReview 表中采用结构化格式
select productId, userId, profileName from AmazonReview;
OK
B0001G6PZC A3F3THLLZXURQN A. Y
B0001G6PZC A3J78KAIPW6KAH Joan Paolo De Bastos "conde_almasy"
B0001G6PZC A3F3B6HY9RJI04 James Duckett
关于regex - 具有多行记录的文本文件的 Hive 外部表定义,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30387682/
我有一个网站。 必须登录才能看到里面的内容。 但是,我使用此代码登录。 doc = Jsoup.connect("http://46.137.207.181/Account/Login.aspx")
我正在尝试为我的域创建一个 SPF 记录并使我的邮件服务器能够对其进行评估。我在邮件服务器上使用 Postfix 并使用 policyd-spf (Python) 来评估记录。目前,我通过我的私有(p
我需要为负载平衡的 AWS 站点 mywebsite.com 添加 CName 记录。记录应该是: @ CNAME mywebsite.us-east-1.elb.amazon
我目前正在开发一个相当大的多层应用程序,该应用程序将部署在海外。虽然我希望它在解聚后不会折叠或爆炸,但我不能 100% 确定这一点。因此,如果我知道我可以请求日志文件,以准确找出问题所在以及原因,那就
我使用以下命令从我的网络摄像头录制音频和视频 gst-launch-0.10 v4l2src ! video/x-raw-yuv,width=640,height=480,framerate=30/1
我刚刚开始使用 ffmpeg 将视频分割成图像。我想知道是否可以将控制台输出信息保存到日志文件中。我试过“-v 10”参数,也试过“-loglevel”参数。我在另一个 SO 帖子上看到使用 ffmp
我想针对两个日期查询我的表并检索其中的记录。 我这样声明我的变量; DECLARE @StartDate datetime; DECLARE @EndDate datetime; 并像这样设置我的变量
在 javascript 中,我可以使用简单的 for 循环访问对象的每个属性,如下所示 var myObj = {x:1, y:2}; var i, sum=0; for(i in myObj) s
最近加入了一个需要处理大量代码的项目,我想开始记录和可视化调用图的一些流程,让我更好地理解一切是如何组合在一起的。这是我希望在我的理想工具中看到的: 每个节点都是一个函数/方法 如果一个函数可以调用另
如何使用反射在F#中创建记录类型?谢谢 最佳答案 您可以使用 FSharpValue.MakeRecord [MSDN]创建一个记录实例,但是我认为F#中没有任何定义记录类型的东西。但是,记录会编译为
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。 想改善这个问题吗?更新问题,使其成为 on-topic对于堆栈溢出。 3年前关闭。 Improve thi
我是 Sequelize 的新手并且遇到了一些语法问题。我制作了以下模型: // User sequelize.define('user', { name: { type: DataTyp
${student.name} Notify 这是我的output.jsp。请注意,我已经放置了一个链接“Notify”以将其转发到 display.jsp 上。但我不确定如何将 Stud
例如,这是我要做的查询: server:"xxx.xxx.com" AND request_url:"/xxx/xxx/xxx" AND http_X_Forwarded_Proto:(https O
我一直在开发大量 Java、PHP 和 Python。所有这些都提供了很棒的日志记录包(分别是 Log4J、Log 或logging)。这在调试应用程序时有很大帮助。特别是当应用程序 headless
在我的Grails应用程序中,我异步运行一些批处理过程,并希望该过程记录各种状态消息,以便管理员以后可以检查它们。 我考虑过将log4j JDBC附加程序用作最简单的解决方案,但是据我所知,它不使用D
我想将进入 MQ 队列的消息记录到数据库/文件或其他日志队列,并且我无法修改现有代码。是否有任何方法可以实现某种类似于 HTTP 嗅探器的消息记录实用程序?或者也许 MQ 有一些内置的功能来记录消息?
如果我有一条包含通用字段的记录,在更改通用字段时是否有任何方法可以模仿方便的 with 语法? 即如果我有 type User = // 'photo can be Bitmap or Url {
假设我有一个名为 Car 的自定义对象。其中的所有字段都是私有(private)的。 public class Car { private String mName; private
当记录具有特定字段时,我需要返回 true 的函数,反之亦然。示例: -record(robot, {name, type=industrial, ho
我是一名优秀的程序员,十分优秀!