gpt4 book ai didi

sql-server - RODBC和Microsoft SQL Server : Truncating Long Character Strings

转载 作者:行者123 更新时间:2023-12-04 11:47:56 27 4
gpt4 key购买 nike

我正在尝试使用R/RODBC从Microsoft SQL Server数据库查询变量。 RODBC正在将字符串截断为8000个字符。

原始代码:截断为255个字符(根据RODBC文档)
library(RODBC)
con_string <- odbcConnect("DSN")
query_string <- "SELECT text_var FROM table_name"
dat <- sqlQuery(con_string, query_string, stringsAsFactors=FALSE)

部分解决方案:修改查询字符串以截断7999个字符后的文本。
library(RODBC)
con_string <- odbcConnect("DSN")
query_string <- "SELECT [text_var]=CAST(text_var AS VARCHAR(8000)) FROM table_name"
dat <- sqlQuery(con_string, query_string, stringsAsFactors=FALSE)

该表/变量包含长达250,000个字符的文本字符串。我真的很想使用R中的所有文本。这可能吗?

@BrianRipley在以下文档的第18页上讨论了该问题(但没有解决方案):
https://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf

@nutterb讨论了GitHub上RODBCext包的类似问题:
https://github.com/zozlak/RODBCext/issues/6

在SO上看到了类似的讨论,但没有使用VARCHAR> 8000的RODBC解决方案。

RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)

RODBC string getting truncated

笔记:

  • R 3.3.2
  • Microsoft SQL Server 2012年
  • Linux RHEL 7.1的
  • 用于SQL Server的Microsoft ODBC驱动程序
  • 最佳答案

    由于这是Microsoft提供的ODBC驱动程序的限制,因此在更改驱动程序之前,几乎没有什么可做的。 @zozlak解释了为什么在您链接的GitHub问题中。

    我倾向于在需要时使用存储过程来解决此问题,但这通常需要为每个特定实例编写一个存储过程。在某个时候,我可能会想出一种更通用的在存储过程中执行此操作的方法,但是我发现在存储过程中构造查询的过程既繁琐又令人沮丧。

    因此,我花了一些时间来构建一个函数,该函数将执行涉及VARCHAR(MAX)变量的有限查询。这是一种蛮力方法,对于17000个字符的变量,将其导出为三个变量并将其粘贴到R中。这很粗糙,可能不是很有效,但是我想出了最好的解决方案。

    另一个限制是它不允许您在查询中重命名变量。您将被数据库中命名的变量所困扰。如果您只涉及几个表,那可能不是问题。在非常复杂的数据库中,这可能会出现问题。但是,至少与此相关,您可以仅查询带有少数必要ID的VARCHAR(MAX)变量,以在R中执行合并。

    正如GitHub问题中所讨论的那样,最好尽可能避免VARCHAR(MAX)。如果确实需要未知的长度,则VARBINARY(MAX)会更容易查询。

    例子

    来源(“https://gist.githubusercontent.com/nutterb/d2e050dada608bb6213e61d0f8471b65/raw/be8717f318b3e3087e7c26c9a8f9d0a582a5daef/query_varchar_max

    channel <- odbcDriverConnect(...)

    query_varchar_max(channel = channel,
    id = c("idvar"),
    varchar_max = c("varchar_max_var", "varchar_max_var2"),
    from = "FROM dbo.table_name WHERE group = ?",
    data = list(group = "A"))

    功能码
    #' @name query_varchar_max
    #' @title Query a VARCHAR(MAX) Variable from SQL Server
    #'
    #' @description The RODBC driver to SQL Server (SQL Server Native Client 11.0)
    #' reports the lenght of a VARCHAR(MAX) variable to be zero. This presents
    #' difficulties in extracting long text values from the database. Often, the
    #' ODBC will assume a length of 255 characters and truncate the text to that
    #' many characters. The approach taken here searches the VARCHAR(MAX) variables
    #' for the longest length, and extracts the data in segments to be pasted
    #' together in R.
    #'
    #' @param channel A valid ODBC channel to a SQL Server database.
    #' @param id A character vector of ID variables that may be used to merge the
    #' data from this query into another dataset.
    #' @param varchar_max a character vector of variable names that are to be
    #' treated as if they are VARCHAR(MAX) variables.
    #' @param from A single character string providing the remainder of the query
    #' to be run, beginning with the \code{FROM} statement.
    #' @param stringsAsFactors \code{logical(1)}. Should character strings returned
    #' from the database be converted to factors?
    #' @param ... Additional arguments to \code{sqlExecute} when running the full
    #' query.
    #'
    #' @details \code{query_varchar_max} operates by determining how many columns of up to
    #' 8000 characters each are required to export a complete VARCHAR(MAX) variable.
    #' It then creates the necessary number of intermediate variables and queries the
    #' data using the SQL Server \code{SUBSTRING} command, extracting the VARCHAR(MAX)
    #' variable in increments of 8000 characters. After completing the query,
    #' the intemediary variables are concatenated and removed from the data.
    #'
    #' The function makes accommodation for multi-part queries as far as [TABLE].[VARIABLE]
    #' formats are concerned. It is not intended for use in [SCHEMA].[TABLE].[VARIABLE]
    #' formats. This at least allows \code{from} to include joins for more complex
    #' queries. Parameterized queries are also supported through \code{sqlExecute}.
    #'
    #' @export

    query_varchar_max <- function(channel, id, varchar_max, from,
    stringsAsFactors = FALSE, ...)
    {
    coll <- checkmate::makeAssertCollection()

    checkmate::assert_class(x = channel,
    classes = "RODBC",
    add = coll)

    checkmate::assert_character(x = id,
    add = coll)

    checkmate::assert_character(x = varchar_max,
    add = coll)

    checkmate::assert_character(x = from,
    len = 1,
    add = coll)

    checkmate::assert_logical(x = stringsAsFactors,
    len = 1,
    add = coll)

    checkmate::reportAssertions(coll)

    varchar_max_len <-
    paste0(
    sprintf("MAX(LEN(%s)) AS len_%s",
    varchar_max,
    sub("[.]", "_", varchar_max)),
    collapse = ", "
    )

    varchar_len <-
    unlist(
    RODBCext::sqlExecute(
    channel = channel,
    query = sprintf("SELECT %s %s",
    varchar_max_len,
    from),
    fetch = TRUE
    )
    )

    varchar_max_cols <-
    unlist(
    mapply(expand_varchar_max,
    varchar_max,
    varchar_len,
    SIMPLIFY = FALSE)
    )

    Prelim <-
    RODBCext::sqlExecute(
    channel = channel,
    query = sprintf("SELECT %s, %s %s",
    paste0(id, collapse = ", "),
    paste0(varchar_max_cols, collapse = ", "),
    from),
    fetch = TRUE,
    stringsAsFactors = stringsAsFactors,
    ...
    )

    var_stub_to_combine <-
    unique(
    sub(
    "(part)(\\d{1,3})",
    "\\1",
    sub(".+AS ", "", varchar_max_cols)
    )
    )

    col_to_combine <-
    lapply(var_stub_to_combine,
    grep,
    names(Prelim))

    Prelim[sub(".+[.]", "", varchar_max)] <-
    lapply(col_to_combine,
    function(col) apply(Prelim[col], 1, paste0, collapse = ""))

    Prelim[-unlist(col_to_combine)]

    }


    expand_varchar_max <- function(varchar_max, varchar_len)
    {
    nvar <- varchar_len %/% 8000 + 1

    var_list <- vector("character", length = nvar)

    for (i in seq_along(var_list))
    {
    var_list[i] <-
    sprintf("SUBSTRING(%s, %s, %s) AS %s_part%s",
    varchar_max,
    1 + (i - 1) * 8000,
    8000,
    paste0(sub("[.]", "_", varchar_max)),
    i)
    }
    var_list
    }

    关于sql-server - RODBC和Microsoft SQL Server : Truncating Long Character Strings,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44249788/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com