gpt4 book ai didi

sql - BULK INSERT 格式文件中的正确排序规则

转载 作者:行者123 更新时间:2023-12-02 21:01:41 34 4
gpt4 key购买 nike

我正在尝试使用带有格式文件的 BULK INSERT 将 .CSV 文件导入到 SQL Server 表中。我可以导入它,但任何拉丁字符都会作为奇怪的字符导入。我为自己完成这个个人项目感到非常自豪,但我已经到了需要帮助的地步。我可以在导入数据后通过执行一些困惑的 UPDATE 和 REPLACE 语句来更改字符,但我真的希望能够一步导入 .CSV 文件中出现的拉丁字符。这是我创建的数据库和表:

CREATE DATABASE Test;

CREATE TABLE dbo.rawData
([Position] nvarchar(500) NULL,
[Const] nvarchar(500) NULL,
[Created] nvarchar(500) NULL,
[Modified] nvarchar(500) NULL,
[Description] nvarchar(500) NULL,
[Title] nvarchar(500) NOT NULL,
[TitleType] nvarchar(500) NULL,
[Directors] nvarchar(500) NULL,
[YouRated] nvarchar(500) NULL,
[IMDbRating] nvarchar(500) NULL,
[Runtime] nvarchar(500) NULL,
[Year] nvarchar(500) NULL,
[Genres] nvarchar(500) NULL,
[NumVotes] nvarchar(500) NULL,
[ReleaseDate] nvarchar(500) NULL,
[URL] nvarchar(500) NULL,
)
GO

这是我正在使用的一些数据,取自 .CSV 文件(另存为 ratings.csv)。我使用Notepad++,它是用UTF-8编码的。请注意“达拉斯买家具乐部”的最后一行有一位名字中带有拉丁字符的董事:

"position","const","created","modified","description","Title","Title type","Directors","You rated","IMDb Rating","Runtime (mins)","Year","Genres","Num. Votes","Release Date (month/day/year)","URL"
"1","tt0437863","Tue Feb 16 00:00:00 2016","","","The Benchwarmers","Feature Film","Dennis Dugan","5","5.6","80","2006","comedy, romance, sport","39413","2006-04-07","http://www.imdb.com/title/tt0437863/"
"2","tt0085334","Tue Feb 16 00:00:00 2016","","","A Christmas Story","Feature Film","Bob Clark","6","8.1","94","1983","comedy, family","103770","1983-11-18","http://www.imdb.com/title/tt0085334/"
"3","tt2403029","Tue Feb 16 00:00:00 2016","","","The Starving Games","Feature Film","Jason Friedberg, Aaron Seltzer","2","3.3","83","2013","comedy","13719","2013-10-31","http://www.imdb.com/title/tt2403029/"
"4","tt0316465","Tue Feb 16 00:00:00 2016","","","Radio","Feature Film","Michael Tollin","6","6.9","109","2003","biography, drama, sport","31692","2003-10-24","http://www.imdb.com/title/tt0316465/"
"5","tt0141369","Tue Feb 16 00:00:00 2016","","","Inspector Gadget","Feature Film","David Kellogg","4","4.1","78","1999","action, adventure, comedy, family, sci_fi","35340","1999-07-18","http://www.imdb.com/title/tt0141369/"
"6","tt0033563","Tue Feb 16 00:00:00 2016","","","Dumbo","Feature Film","Sam Armstrong, Norman Ferguson","6","7.3","64","1941","animation, family, musical","80737","1941-10-23","http://www.imdb.com/title/tt0033563/"
"7","tt0384642","Tue Feb 16 00:00:00 2016","","","Kicking & Screaming","Feature Film","Jesse Dylan","5","5.5","95","2005","comedy, family, romance, sport","29539","2005-05-01","http://www.imdb.com/title/tt0384642/"
"8","tt0116705","Tue Feb 16 00:00:00 2016","","","Jingle All the Way","Feature Film","Brian Levant","7","5.4","89","1996","comedy, family","66879","1996-11-16","http://www.imdb.com/title/tt0116705/"
"9","tt1981677","Tue Feb 16 00:00:00 2016","","","Pitch Perfect","Feature Film","Jason Moore","7","7.2","112","2012","comedy, music, romance","203205","2012-09-28","http://www.imdb.com/title/tt1981677/"
"10","tt0409459","Tue Feb 16 00:00:00 2016","","","Watchmen","Feature Film","Zack Snyder","7","7.6","162","2009","action, mystery, sci_fi","368137","2009-02-23","http://www.imdb.com/title/tt0409459/"
"11","tt1343092","Tue Feb 16 00:00:00 2016","","","The Great Gatsby","Feature Film","Baz Luhrmann","5","7.3","143","2013","drama, romance","345664","2013-05-01","http://www.imdb.com/title/tt1343092/"
"12","tt0332379","Tue Feb 16 00:00:00 2016","","","School of Rock","Feature Film","Richard Linklater","5","7.1","108","2003","comedy, music","202083","2003-09-09","http://www.imdb.com/title/tt0332379/"
"13","tt0120783","Tue Feb 16 00:00:00 2016","","","The Parent Trap","Feature Film","Nancy Meyers","6","6.4","128","1998","adventure, comedy, drama, family, romance","82087","1998-07-20","http://www.imdb.com/title/tt0120783/"
"14","tt0790636","Tue Feb 16 00:00:00 2016","","","Dallas Buyers Club","Feature Film","Jean-Marc Vallée","7","8.0","117","2013","biography, drama","308118","2013-09-07","http://www.imdb.com/title/tt0790636/"

我有一个格式文件(另存为 format.fmt),在 Notepad++ 中打开时如下所示:

11.0
16
1 SQLCHAR 0 1000 "\",\"" 1 Position SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 1000 "\",\"" 2 Const SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 1000 "\",\"" 3 Created SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 1000 "\",\"" 4 Modified SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 1000 "\",\"" 5 Description SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 1000 "\",\"" 6 Title SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 1000 "\",\"" 7 TitleType SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 1000 "\",\"" 8 Directors SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 1000 "\",\"" 9 YouRated SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 1000 "\",\"" 10 IMDbRating SQL_Latin1_General_CP1_CI_AS
11 SQLCHAR 0 1000 "\",\"" 11 Runtime SQL_Latin1_General_CP1_CI_AS
12 SQLCHAR 0 1000 "\",\"" 12 Year SQL_Latin1_General_CP1_CI_AS
13 SQLCHAR 0 1000 "\",\"" 13 Genres SQL_Latin1_General_CP1_CI_AS
14 SQLCHAR 0 1000 "\",\"" 14 NumVotes SQL_Latin1_General_CP1_CI_AS
15 SQLCHAR 0 1000 "\",\"" 15 ReleaseDate SQL_Latin1_General_CP1_CI_AS
16 SQLCHAR 0 1000 "\"" 16 URL SQL_Latin1_General_CP1_CI_AS

当我运行以下代码时,所有内容都会导入,但是拉丁字符会被一系列奇怪的字符替换。这是我正在运行的代码:

BULK INSERT [Test].[dbo].[rawData]
FROM 'C:\IMDbRatings\Files\ratings.csv' WITH (FIRSTROW = 2, FORMATFILE= 'C:\IMDbRatings\format.fmt');

我尝试过的一些事情是将.CSV文件更改为UCS-2 BE,在BULK INSERT的WITH子句中添加不同的条件,并将格式文件中的变量类型更改为SQLNCHAR而不是SQLCHAR,但没有任何效果。在这些情况下通常会发生“0 行受到影响”,而不是错误。任何帮助将不胜感激。

最佳答案

我回答这个老问题是希望它能帮助别人解决我最近经历的麻烦。

简单地说:当使用代码页 65001 从 UTF-8 编码文件插入时,您应该在格式文件中使用 "" 排序规则。您必须拥有 SQL Server 2016,才能将代码页 65001 插入到格式文件中。可用。


执行以下操作:

  1. 在批量插入语句中使用 CODEPAGE = 65001 指定批量插入表采用 UTF-8 编码
  2. 在格式文件中,将字符列类型指定为 SQLCHAR
  3. 在格式文件中,对所有列使用“”排序规则

批量插入语句:

BULK INSERT [Test].[dbo].[rawData]
FROM 'C:\IMDbRatings\Files\ratings.csv'
WITH (CODEPAGE = 65001, FIRSTROW = 2, FORMATFILE= 'C:\IMDbRatings\format.fmt');

格式化文件:

13.0
16
1 SQLCHAR 0 1000 "\",\"" 1 Position ""
2 SQLCHAR 0 1000 "\",\"" 2 Const ""
3 SQLCHAR 0 1000 "\",\"" 3 Created ""
4 SQLCHAR 0 1000 "\",\"" 4 Modified ""
5 SQLCHAR 0 1000 "\",\"" 5 Description ""
6 SQLCHAR 0 1000 "\",\"" 6 Title ""
7 SQLCHAR 0 1000 "\",\"" 7 TitleType ""
8 SQLCHAR 0 1000 "\",\"" 8 Directors ""
9 SQLCHAR 0 1000 "\",\"" 9 YouRated ""
10 SQLCHAR 0 1000 "\",\"" 10 IMDbRating ""
11 SQLCHAR 0 1000 "\",\"" 11 Runtime ""
12 SQLCHAR 0 1000 "\",\"" 12 Year ""
13 SQLCHAR 0 1000 "\",\"" 13 Genres ""
14 SQLCHAR 0 1000 "\",\"" 14 NumVotes ""
15 SQLCHAR 0 1000 "\",\"" 15 ReleaseDate ""
16 SQLCHAR 0 1000 "\"" 16 URL ""

基于 https://technet.microsoft.com/en-us/library/ms190657(v=sql.105).aspx""RAW 排序规则:

Specifies that the data is stored in the code page that is specified in a code-page option in the command or the bcp_control BCPFILECP hint. If none of these is specified, the collation of the data file is that of the OEM code page of the client computer.

关于sql - BULK INSERT 格式文件中的正确排序规则,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37954380/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com