gpt4 book ai didi

mysql - 从使用双引号作为转义字符的 CSV 文件加载数据

转载 作者:IT王子 更新时间:2023-10-29 00:31:09 27 4
gpt4 key购买 nike

我有一堆 CSV 数据需要加载到 MySQL 数据库中。好吧,也许是 CSV-ish。 (编辑:actually, it looks like the stuff described in RFC 4180)

每一行都是逗号分隔的双引号字符串列表。要转义列值中出现的任何双引号,请使用双引号。允许反斜杠代表自己。

例如,行:

"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""

如果解析成JSON应该是:

[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]

我正在尝试使用 LOAD DATA 读取 CSV,但我遇到了一些奇怪的行为。


举个例子,假设我有一个简单的两列表

shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID | int(11) | YES | | NULL | |
| UID | char(255) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
shell%

如果我的输入文件的第一个非标题行以 "" 结尾:

shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"

我可以加载除第一行以外的所有非标题行:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 0

mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)

或者我可以加载所有行,包括标题:

mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)

mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11 Deleted: 0 Skipped: 0 Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 9 | |
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
11 rows in set (0.00 sec)

如果我的输入文件中没有任何行以 "" 结尾:

shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"

然后我可以不加载任何行:

mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)

mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Deleted: 0 Skipped: 0 Warnings: 0

mysql> SELECT * FROM person;
Empty set (0.00 sec)

或者我可以加载所有行,包括标题:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 0 | Steve the Pirate |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)

现在我发现了很多错误的方法,我如何使用 LOAD DATA 将这些文件中的数据导入我的数据库?

最佳答案

根据the documentation for LOAD DATA, treating doubled double quotes as a double quote is the default :

If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss -> The "BIG" boss
The ""BIG"" boss -> The ""BIG"" boss

所以我需要做的就是禁用将 \ 解释为转义字符,方法是使用 ESCAPED BY ''

LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY ''
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;

关于mysql - 从使用双引号作为转义字符的 CSV 文件加载数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17053530/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com