gpt4 book ai didi

hadoop - 需要用户选择/输入四分之一以在PIG-0.12.0中获得所需的输出

转载 作者:行者123 更新时间:2023-12-02 21:48:33 27 4
gpt4 key购买 nike

我正在处理NYSE数据集,其中创建了以下问题陈述-

与之前的变化率相比,最近几个季度的收入增长。
每季度获取一次数据,并将其与可比较的时间范围(例如,每季度一次)进行比较,并提供报告作为性能比较。

我创建了如下所述的脚本-

daily = load 'NYSE_daily.txt' using PigStorage(',') as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float , low:float, close:float, volume:float, adj_close:float);
--describe daily;
qtr1_filter = FILTER daily by date >= '2009-04-01' and date <= '2009-06-30';
qtr1_result = foreach qtr1_filter generate symbol, close-open as change;
qtr1_grp_stock = group qtr1_result by symbol;
qtr1_stocks = foreach qtr1_grp_stock generate group as symbol, SUM(qtr1_result.change) as change;

qtr2_filter = FILTER daily by date >= '2009-07-01' and date <= '2009-09-30';
qtr2_result = foreach qtr2_filter generate symbol, close-open as change;
qtr2_grp_stock = group qtr2_result by symbol;
qtr2_stocks = foreach qtr2_grp_stock generate group as symbol, SUM(qtr2_result.change) as change;

qtr_join = JOIN qtr1_stocks by symbol, qtr2_stocks by symbol;
stocks_ord = order qtr_join by qtr2_stocks::change desc;
store stocks_ord into 'earninggrowth';

这将工作几个季度,但我需要动态输入/指定季度以获取所需的输出。

我要实现的第二件事是获取年度性能数据并比较性能。该时间段可以是会计年度,即从4月1日到3月31日。

我一直在搜寻,但是找不到任何可以帮助您学习动态处理日期的东西。

最佳答案

那么使用宏呢:

growth.pig:

DEFINE findGrowth(data, startDate, endDate) RETURNS stocks {
filtered = FILTER data by date >= '$startDate' and date <= '$endDate';
res = foreach filtered generate symbol, close-open as change;
grp_stock = group res by symbol;
$stocks = foreach grp_stock generate group as symbol, SUM(res.change) as change;
};

DEFINE changesBetween(start1, end1, start2, end2) RETURNS void {
data = load '/path/to/NYSE_daily' using PigStorage() as (exchange:chararray,
symbol:chararray, date:chararray, open:float, high:float,
low:float, close:float, volume:float, adj_close:float);
stocks1 = findGrowth(data, '$start1', '$end1');
stocks2 = findGrowth(data, '$start2', '$end2');
joined = JOIN stocks1 by symbol, stocks2 by symbol;
ord = order joined by stocks2::change desc;
store joined into 'changes_$start1-$end2';
};

changesBetween('$startDate1', '$endDate1', '$startDate2', '$endDate2');

然后创建一个shell脚本
findChanges.sh:
#!/bin/bash
# Usage ./findChanges.sh [startDate1 endDate1 startDate2 endDate2]
pig -f growth.pig -param startDate1="$1" -param endDate1="$2" \
-param startDate2="$3" -param endDate2="$4"

这样您最终将拥有:
./findChanges.sh 2009-04-01 2009-06-30 2009-07-01 2009-09-30

这将在“changes_2009-04-01-2009-09-30”下创建结果

关于hadoop - 需要用户选择/输入四分之一以在PIG-0.12.0中获得所需的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23064819/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com