hadoop - pig : how to create a categorical variable?-6ren

hadoop - pig : how to create a categorical variable?

转载作者：可可西里更新时间：2023-11-01 16:44:02

27

4

我在大型数据集上使用 PIG 0.12，我需要创建一个分类变量，例如

FOREACH mydata GENERATE category = 1 IF condition1
                        category = 2 IF condition2
                        category = 3 IF condition3

该语法不起作用。在 Pig 中可以这样做吗？

谢谢!

最佳答案

根据病情的复杂程度，这里有一些解决办法:

bincond :

(condition ? value_if_true : value_if_false)

case :

X = FOREACH A GENERATE f2, (
  CASE f2 % 2
    WHEN 0 THEN 'even'
    WHEN 1 THEN 'odd'
  END
);

udf :

FOREACH mydata GENERATE category_udf(field_2b_checked)

关于hadoop - pig : how to create a categorical variable?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38076809/

27

4

0

文章推荐： web-services - RESTEasy 客户端代理抢占式基本身份验证

文章推荐： http - 自定义登录页面的正确 http 状态代码是什么？

文章推荐： http - 分块传输和 multipart http

python - 在功能上，torch.multinomial 与 torch.distributions.categorical.Categorical 相同吗？
例如，如果我提供 [0.5, 0.5] 的概率数组，两个函数将以相等的概率对索引 [0,1] 进行采样？最佳答案是的: [torch.distributions.categorical.Categ
python - 隔离林 : Categorical data
我正在尝试使用 sklearn 中的隔离森林检测乳腺癌数据集中的异常。我正在尝试将 Iolation Forest 应用于混合数据集，当我拟合模型时，它会给我值错误。这是我的数据集: https:/
mysql - 慢查询: Data categorization
我目前有一个包含产品信息的表(AllProducts)。它有 16 列，大约 125000 行。我需要在数据库中创建一个唯一值，因为表中没有唯一值。我无法使用自动增量功能，因为我的数据库每天都会被清
julia - `CategoricalArray` 构造函数和 `categorical` 函数的区别
CategoricalArray构造函数和 categorical函数来自 CategoricalArrays.jl行为似乎几乎相同: julia> using CategoricalArrays j
pandas - pd.Categorical.from_codes 缺少值
假设我有: df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 21
Java:将数组列表相互链接以执行 "top categorization"任务
注意:源代码包含多个类，因此为了节省您的时间，我不会发布它，但会为您提供上下文。如果我不能很好地解释，请原谅。我已经在这方面做了很多工作，我的解释对我来说有意义，但对其他人来说可能没有意义。我的任务
machine-learning - sklearn : Categorical Imputer?
有没有办法使用 sklearn.preprocessing 对象来估算分类值？我想最终创建一个预处理对象，我可以将其应用于新数据并以与旧数据相同的方式对其进行转换。我正在寻找一种方法来做到这一点，以
python - 识别数据集中的特征类型: categorical or bag of words
我正在尝试识别数据集中的特征类型，该特征可以是分类/词袋/ float 。但是，由于以下原因，我无法找到区分分类词和词袋的准确解决方案。分类数据可以是对象或 float 。计算特征中的唯一值并不能
mysql - "Categorize"或 "Mark"选择查询
我有一个查询，该查询从多个数据库中进行选择，其中列“auth”= mad: $result = mysql_query("SELECT * FROM test.1 WHERE auth = 'mad'
python - tf.random.categorical 给出奇怪的结果
我正在尝试在 tensorflow 中实现 np.random.choice。这是我的实现 import numpy as np import tensorflow as tf p=tf.Variab
hadoop - pig : how to create a categorical variable?
我在大型数据集上使用 PIG 0.12，我需要创建一个分类变量，例如 FOREACH mydata GENERATE category = 1 IF condition1
r - ggplot : Boxplot by several categorical variables
我正在尝试使用 ggplot2 在一张图表中绘制多个箱线图。我有 1 个连续变量和几个因素。我想要一个 Y 轴，每对箱线图都有自己的 x 轴和自己的因子水平。到目前为止，我尝试使用 cowplot::
r - 混合Cor : Misidentification of categorical data for PCA?
我正在使用 psych 运行一系列主成分分析包裹在 R .我混合了连续(读取离散)、二进制和有序变量。请参阅下面的数据子集，其中包含 10 个连续(读取离散)变量( c1 到 c10 )和一个二分变量
r - brms : how do I set prior on categorical variable?
我正在使用 2 个分类变量构建二项式回归模型。这是来自《Statistical rethinking》一书中的一个例子。在书中，在使用 rethinking 包时，我们可以为每个分类变量设置先验，如下
python - 我什么时候应该使用 Pandas 的 Categorical dtype？
我的问题涉及优化 Pandas 系列的内存使用。文档 note , The memory usage of a Categorical is proportional to the number of
machine-learning - 如何使用卷积网络将图像分类为 'not categorized image'
为了对 10 种类型的图像进行分类，我使用 Keras 建模了基于卷积的序列模型。我使用了 categorical_crossentropy。我的问题是，虽然通过包含验证数据我在训练模型上获得了 99
c++ - "lightweight type categorization idiom"的最简单实现？
我的目标是实现一个检测嵌套 using 是否存在的谓词别名(或 typedef )充当轻量级标签以指示类具有某些属性(用于泛型编程)。例如，has_my_tag谓词的行为应如下所示: struct A
python - 谁能举一个小例子来解释 tf.random.categorical 的参数？
tensorflow 的网站给出了这个例子 tf.random.categorical(tf.log([[10., 10.]]), 5) 产生一个“形状为 [1, 5] 的张量，其中每个值为 0 或
python Pandas : merge loses categorical columns
我正在处理分类数据的大型数据帧，我发现当我在两个数据帧上使用 pandas.merge 时，任何分类数据列都会自动向上转换为更大的数据类型。 (这会显着增加 RAM 消耗。)一个简单的例子来说明: 编
R 和 ggplot : shade the color of the categorical variable
我想制作一个图表来表示一周中每一天每小时发生事件的频率。如何更改星期几的颜色？我想从黑色切换到红色以接近周末 weekday_hour_pickup % mutate(hour_pic

首页

博学

6Ren·AI

商城

hadoop - pig : how to create a categorical variable?