gpt4 book ai didi

hadoop - 在HIVE中仅加载STRING定义的列,即,具有int和double的列为NULL

转载 作者:行者123 更新时间:2023-12-02 19:25:48 25 4
gpt4 key购买 nike

在HIVE中仅加载STRING定义的列,即,具有int和double的列为NULL

创建表命令

create table A(
id STRING,
member_id STRING,
loan_amnt DOUBLE,
funded_amnt DOUBLE,
`funded_amnt_inv` DOUBLE,
`term` STRING,
`int_rate` STRING,
`installment` DOUBLE,
`grade` STRING,
`sub_grade` STRING,
`emp_title` STRING,
`emp_length` STRING,
`home_ownership` STRING,
`nnual_inc` INT,
`verification_status` STRING,
`issue_d` STRING,
`loan_status` STRING,
`pymnt_plan` STRING,
`url` STRING,
`desc` STRING,
`purpose` STRING,
`title` STRING,
`zip_code` STRING,
`addr_state` STRING,
`dti` DOUBLE,
`delinq_2yrs` INT,
`earliest_cr_line` STRING,
`inq_last_6mths` STRING,
`mths_since_last_delinq` STRING,
`mths_since_last_record` STRING,
`open_acc` INT,
`pub_rec` INT,
`revol_bal` INT,
`revol_util` STRING,
`total_acc` INT,
`initial_list_status` STRING,
`out_prncp` DOUBLE,
`out_prncp_inv` DOUBLE,
`total_pymnt` DOUBLE,
`total_pymnt_inv` DOUBLE,
`total_rec_prncp` DOUBLE,
`total_rec_int` DOUBLE,
`total_rec_late_fee` DOUBLE,
`recoveries` DOUBLE,
`collection_recovery_fee` DOUBLE,
`last_pymnt_d` STRING,
`last_pymnt_amnt` DOUBLE,
`next_pymnt_d` STRING,
`last_credit_pull_d` STRING,
`collections_12_mths_ex_med` INT,
`mths_since_last_major_derog` STRING,
`policy_code` STRING,
`application_type` STRING,
`annual_inc_joint` STRING,
`dti_joint` STRING,
`verification_status_joint` STRING,
`acc_now_delinq` STRING,
`tot_coll_amt` STRING,
`tot_cur_bal` STRING,
`open_acc_6m` STRING,
`open_il_6m` STRING,
`open_il_12m` STRING,
`open_il_24m` STRING,
`mths_since_rcnt_il` STRING,
`total_bal_il` STRING,
`il_util` STRING,
`open_rv_12m ` STRING,
`open_rv_24m` STRING,
`max_bal_bc` STRING,
`all_util` STRING,
`total_credit_rv` STRING,
`inq_fi` STRING,
`total_fi_tl` STRING,
`inq_last_12m` STRING
)

ROW FORMAT delimited
fields terminated by ','

STORED AS TEXTFILE;

将数据加载到表A中
load data local inpath '/home/cloudera/Desktop/Project-3/1/LoanStats3a.txt' into table A;

选择数据
hive> SELECT * FROM A LIMIT 1;

输出量

"1077501" "1296599" NULL NULL NULL " 36 months" " 10.65%" NULL "B" "B2" "" "10+ years" "RENT" NULL "Verified" "Dec-2011" "Fully Paid" "n" "https://www.lendingclub.com/browse/loanDetail.action?loan_id=1077501" " Borrower added on 12/22/11 > I need to upgrade my business technologies.
" "credit_card" "Computer" "860xx" "AZ" NULL NULL "Jan-1985" "1" "" "" NULL NULL NULL "83.7%"NULL "f" NULL NULL NULL NULL NULL NULL NULL NULL NULL "Jan-2015" NULL "" "Dec-2015" NULL "" "1" "INDIVIDUAL"

"" "" "" "0" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

最佳答案

您的CSV似乎包含各个字段的引号。 HIVE不支持周围的报价,因此它们成为该字段的一部分。对于字符串字段,引号成为字符串的一部分。如果是数字字段,则引号会使该字段成为无效数字,从而导致NULL。

有关支持CSV文件中引号的Serde,请参见csv-serde

关于hadoop - 在HIVE中仅加载STRING定义的列,即,具有int和double的列为NULL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39438373/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com