gpt4 book ai didi

hadoop - 比较 2 个配置单元表以查找没有任何唯一列/时间戳的更新/插入/删除记录并将其附加到 Hadoop 中的基表

转载 作者:可可西里 更新时间:2023-11-01 15:26:51 27 4
gpt4 key购买 nike

Base_table (Day 01 load from source)

 **Id    Name    City      Country** 
7682 Stuart Frankfurt Germany
8723 Micke Paris France
2355 Niki New york USA
2097 Deny Italy Rome

new_table (Day 02 load from source)

 **Id    Name    City      Country** 
7682 Stuart *Darmstadt* Germany
8723 Micke Paris France
2355 Niki New york USA
*9057 Bony Prague Prague*

比较以上 2 个表格时,可以看到以下 3 个变化。

  1. Record Id 7682's City name changed to Darmstadt in Day 02 load
  2. Record Id 2097 is deleted in Day 02 load and it was present in Day 01 load
  3. New Record inserted with Id 9057 on Day 02 load

需要捕获所有上述 3 个更改并将其附加到 Base_table

Below 3 records should be captured out of comparision

7682   Stuart  Darmstadt   Germany
2097 Deny Italy Rome
9057 Bony Prague Prague

Base_table output after appending Day 02 changes

**Id    Name    City      Country** 
7682 Stuart Frankfurt Germany
8723 Micke Paris France
2355 Niki New york USA
2097 Deny Italy Rome
*7682 Stuart Darmstadt Germany*
*2097 Deny Italy Rome*
*9057 Bony Prague Prague*

我能够使用 SQL 连接获取插入和删除的记录,但无法获取更新的记录。为了获取更新的记录,我在本地将文件复制到 linux 并进行比较,但它不适用于大量数据。任何人都可以分享您处理此类情况的经验吗?

最佳答案

select      inline
(
array
(
case
when n.id is null then struct(b.*)
else struct (n.*)
end
)
)

from base_table as b
full join new_table as n
on n.id = b.id

where b.id is null
or n.id is null
or struct(b.*) not in (struct(n.*))

+------+--------+-----------+---------+
| col1 | col2 | col3 | col4 |
+------+--------+-----------+---------+
| 2097 | Deny | Italy | Rome |
| 7682 | Stuart | Darmstadt | Germany |
| 9057 | Bony | Prague | Prague |
+------+--------+-----------+---------+

关于hadoop - 比较 2 个配置单元表以查找没有任何唯一列/时间戳的更新/插入/删除记录并将其附加到 Hadoop 中的基表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43802507/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com