gpt4 book ai didi

sql - 将(非 CSV)文本数据导入 PostgreSQL,以空格和一个大写字母分隔

转载 作者:行者123 更新时间:2023-11-29 11:43:29 26 4
gpt4 key购买 nike

这是我第一次使用 SQL。我在 Windows 7 64 位上使用 PostgreSQL。

我有以下(大).txt 文件,其中包含这样构建的推文:

T   2009-06-07 02:07:41
U http://twitter.com/cyberplumber
W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw

如您所见,所有三个“列”都按以下方式分隔:T\t(U 和 W 也是如此)而不是传统的逗号 (,).我想将整个文件导入一个 SQL 表,其中包含名为 dateusertext_msg 的列。

我猜我可能不得不以某种方式解析它。任何想法如何以最简单和最有效的方式将数据放入表中?另请注意,有问题的 .txt 文件相当大 (>4GB),因此我没有简单的方法手动编辑它们。

最佳答案

快速&肮脏的黑客攻击:

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE TABLE lutser
( id SERIAL NOT NULL PRIMARY KEY
, ztxt text
);

CREATE TABLE tweetdeck
( id SERIAL NOT NULL PRIMARY KEY
, stamp timestamp NOT NULL
, zurl text
, ztxt text
);

COPY lutser(ztxt)
FROM '/tmp/tweet.dat'
;

INSERT INTO tweetdeck (stamp, zurl, ztxt)
SELECT regexp_replace( t.ztxt, E'^[A-Z][ \t]*', '')::timestamp
, regexp_replace( u.ztxt, E'^[A-Z][ \t]*', '')
, regexp_replace( w.ztxt, E'^[A-Z][ \t]*', '')
FROM lutser t
JOIN lutser u ON u.id = t.id+1
JOIN lutser w ON w.id = t.id+2
WHERE t.id %3 = 1
AND LEFT(t.ztxt,1) = 'T' -- Should be redundant, Won't harm
AND LEFT(u.ztxt,1) = 'U'
AND LEFT(w.ztxt,1) = 'W'
;


SELECT * FROM lutser;
SELECT * FROM tweetdeck;

结果:

COPY 9
INSERT 0 3
id | ztxt
----+--------------------------------------------------------------------------------------------------------------------------------------------------
1 | T 2009-06-07 02:07:31
2 | U http://twitter.com/cyberplumber
3 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
4 | T 2009-06-07 02:07:41
5 | U http://twitter.com/cyberplumber
6 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
7 | T 2009-06-07 02:07:51
8 | U http://twitter.com/cyberplumber
9 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
(9 rows)


id | stamp | zurl | ztxt
----+---------------------+---------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------
1 | 2009-06-07 02:07:31 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
2 | 2009-06-07 02:07:41 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
3 | 2009-06-07 02:07:51 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
(3 rows)

关于sql - 将(非 CSV)文本数据导入 PostgreSQL,以空格和一个大写字母分隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16263263/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com