gpt4 book ai didi

sql - 需要帮助识别表中的重复项

转载 作者:行者123 更新时间:2023-11-29 12:11:52 25 4
gpt4 key购买 nike

我有什么:

  1. data_source_1
  2. data_source_2
  3. data_sources_view 查看

关于表格:

data_source_1:

没有重复:

db=# select count(*) from (select distinct * from data_source_1);
count
--------
543243
(1 row)

db=# select count(*) from (select * from data_source_1);
count
--------
543243
(1 row)

data_source_2:

没有重复:

db=# select count(*) from (select * from data_source_2);
count
-------
5304
(1 row)

db=# select count(*) from (select distinct * from data_source_2);
count
-------
5304
(1 row)

data_sources_view:

有重复:

db=# select count(*) from (select distinct * from data_sources_vie);
count
--------
538714
(1 row)

db=# select count(*) from (select * from data_sources_view);
count
--------
548547
(1 row)

View 很简单:

CREATE VIEW data_sources_view
AS SELECT *
FROM (
(
SELECT a, b, 'data_source_1' as source
FROM data_source_1
)
UNION ALL
(
SELECT a, b, 'data_source_2' as source
FROM data_source_2
)
);

我想知道的:

  • 在源表没有重复项的 View 中怎么可能有重复项 + 'data_source_x' 作为源 消除了重叠数据的可能性。
  • 如何识别复制品?

我尝试过的:

db# create table t1 as select * from data_sources_view;
SELECT
db=#
db=# create table t2 as select distinct * from data_sources_view;
SELECT
db=# create table t3 as select * from t1 minus select * from t2;
SELECT
db=# select 't1' as table_name, count(*) from t1 UNION ALL
db-# select 't2' as table_name, count(*) from t2 UNION ALL
db-# select 't3' as table_name, count(*) from t3;
table_name | count
------------+--------
t1 | 548547
t3 | 0
t2 | 538714
(3 rows)

数据库:

Redshift (PostgreSQL)

最佳答案

原因是因为你的数据源有多于两列。如果您进行这些计数:

select count(*) from (select distinct a, b from data_source_1);

select count(*) from (select distinct a, b from data_source_2);

你应该会发现它们和你在同一张 table 上得到的count(*)不一样。

关于sql - 需要帮助识别表中的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29708621/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com