gpt4 book ai didi

python - 使用参数 : pandas (or pyodbc) not functioning properly 从 SQL Server 读取

转载 作者:太空狗 更新时间:2023-10-29 20:25:15 25 4
gpt4 key购买 nike

我在 SQL Server 中使用一个查询,它需要一个范围来检查数字是否在该范围内(例如,在下面检查 DemographicGroupDimID 是否为(1,2 或 3) . 在做了一些谷歌搜索后,我发现能够做到这一点的唯一方法如下:

SQL

DECLARE @adults table (Id int)
INSERT INTO @adults VALUES (1), (2), (3)

SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]

如果我需要使用不同的范围动态执行此操作,则会失败,如下所示:

Pandas 查询

from queries import DB_CREDENTIALS
import pyodbc
import pandas as pd

sql_ = """DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?

SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]"""

with pyodbc.connect(DB_CREDENTIALS) as cnxn:
df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])

错误:

---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
<ipython-input-5-4b63847d007f> in <module>()
1 with pyodbc.connect(DB_CREDENTIALS) as cnxn:
----> 2 df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])

C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
497 sql, index_col=index_col, params=params,
498 coerce_float=coerce_float, parse_dates=parse_dates,
--> 499 chunksize=chunksize)
500
501 try:

C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1593
1594 args = _convert_params(sql, params)
-> 1595 cursor = self.execute(*args)
1596 columns = [col_desc[0] for col_desc in cursor.description]
1597

C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1570 ex = DatabaseError(
1571 "Execution failed on sql '%s': %s" % (args[0], exc))
-> 1572 raise_with_traceback(ex)
1573
1574 @staticmethod

C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1558 cur.execute(*args, **kwargs)
1559 else:
-> 1560 cur.execute(*args)
1561 return cur
1562 except Exception as exc:

DatabaseError: Execution failed on sql 'DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?

SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]': ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '@P1'. (102) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")

这是因为 declare 语句需要在 select 语句本身的范围内吗?我不确定 pandas 如何处理 pyodbc 游标对象,因此我不确定此错误的来源。

编辑:请注意,我在此实例中传递的参数是 (30) 只是为了使用范围内只有一个数字失败的简单情况。对于更复杂的字符串,例如 (1), (2), (3),它当然也会失败,就像上面的示例一样。

最佳答案

如果您使用 prepared statements在您的 SQL 中,您不能为一个占位符/参数/绑定(bind)变量放置多个值!

除此之外,您只能使用占位符/参数/绑定(bind)变量来代替 literals ,您不能将它用于 不是 文字的 SQL 语句的一部分。

在您的情况下,您尝试将 () 作为 SQL 的一部分,但不是文字作为参数。

使用参数/准备好的语句/绑定(bind)变量也可以保护您免受一些 SQL 注入(inject)。

就是说,尝试按如下方式更改您的代码:

改变

INSERT INTO @adults VALUES ?

INSERT INTO @adults VALUES (?)

df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])

df = pd.read_sql(sql=sql_, con=cnxn, params=['30'])

更新:

你可以这样准备你的SQL:

In [9]: vals = [20,30,40]

In [32]: vals
Out[32]: [20, 30, 40]

In [33]: ' (?)' * len(vals)
Out[33]: ' (?) (?) (?)'

然后:

In [14]: sql_ = """DECLARE @adults table (Id int)
....: INSERT INTO @adults VALUES {}
....:
....: SELECT [date],
....: """

In [15]: sql_.format(' (?)' * len(vals))
Out[15]: 'DECLARE @adults table (Id int)\nINSERT INTO @adults VALUES (?) (?) (?)\n\nSELECT [date],\n'

注意生成的(?) (?) (?)

最后调用你的 SQL:

df = pd.read_sql(sql=sql_.format(' (?)' * len(vals)), con=cnxn, params=vals)

关于python - 使用参数 : pandas (or pyodbc) not functioning properly 从 SQL Server 读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37051297/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com