gpt4 book ai didi

python - Pandas 随机插入不存在的分隔符

转载 作者:行者123 更新时间:2023-12-01 04:29:38 27 4
gpt4 key购买 nike

我真的很摸不着头脑,但这对我来说毫无意义。我使用 pandas 是一种非常简单的方法,在 tsv 中读取。这是最少的代码:

source = pd.read_csv("neimanmarcus.csv", sep="\t")
images = source["image_link"]

此文件中的所有行正好有 53 个制表符。出于某种原因,pandas 认为大约 2% 的 pandas 恰好有 72 个制表符。这会导致以下错误:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 54 fields in line x, saw 73

也就是说,经过手动检查,我无法发现受影响的行有任何差异。在这种情况下跳过行会很成问题,所以我正在尝试解决这个问题,但我束手无策。如果这有点愚蠢,我很抱歉,但这里有“正确”和“不正确”行的示例。

正确:

sku157001669    Tango Dancer-Print A-Line Dress, Size: 4, TANGO - Carolina Herrera  Carolina Herrera Tango Dancer-Print A-Line Dress Details Carolina Herrera tango dancer-print woven dress. Approx. measurements: 35.5"L center back to hem, 35.5"L center front to hem. V'd jewel neckline. Cap sleeves. Self-tie belt at natural waist; ties at left. Inverted center pleat at A-line skirt. Straight hem. Fit and flare silhouette. Hidden back zip. Cotton/spandex; dry clean. Made in Italy. Model's measurements: Height 5'10"/177cm, bust 34"/86cm, waist 26"/66cm, hips 35.5"/90cm, dress size US 2. Designer About Carolina Herrera: The empress of classically refined looks for both day and evening, Carolina Herrera launched her eponymous line in 1980 after encouragement from her friend, legendary Vogue editor Diana Vreeland. Over the years she has collected a number of fashion's highest accolades as well as a star-studded client list. With both a global focus and adoration for the sum of all things beautiful, Carolina Herrera has been hailed as "Fashion's First Lady." Size: 4. Color: TANGO. Age Group: Adult. Material: 97% COTTON, 3% ELASTANE. Apparel & Accessories > Clothing > Dresses  Women's Apparel > Mid-Length > Daytime Dresses > Mid    1390.00 USD 1390.00 USD     http://www.neimanmarcus.com/en-us/Carolina-Herrera-Tango-Dancer-Print-A-Line-Dress/prod177890243/p.prod     http://images.neimanmarcus.com/product_assets/B/2/W/Y/K/NMB2WYK_mz.jpg  http://images.neimanmarcus.com/product_assets/B/2/W/Y/K/NMB2WYK_az.jpg  Carolina Herrera    07667702164817  prod177890243       new in stock        prod177890243   TANGO   97% COTTON, 3% ELASTANE     4           female  Adult       US::Ground:0.00 USD                                                                                             

错误:

sku158601482    Sleeveless Faux-Wrap Jersey Dress, Women's, Size: 2X, BLACK - Eileen Fisher Eileen Fisher Sleeveless Faux-Wrap Jersey Dress, Women's Details Eileen Fisher jersey dress in your choice of color. Round neckline; sleeveless. Faux-wrap style. Shift silhouette. Viscose/spandex; machine wash. Made in USA of imported materials. Model's measurements: Height 5'10.5"/179cm, bust 32"/81cm, waist 24"/61cm, hips 35.5"/90cm, dress size US 2/4. Necklace not included. Designer Please note: Apparel may be available in more sizes: Shop Eileen Fisher Petite Shop Eileen Fisher Women's About Eileen Fisher: Former interior and graphic designer Eileen Fisher launched her self-named collection in 1984. The acclaimed designer made her mark with clean lines, simple shapes, and a timeless, functional style. Size: 2X. Color: BLACK. Age Group: Adult. Material: " 92% Viscose/8% Spandex F4VF-D3502 / D2502X: Body: 92% Viscose, 8% Spandex Hem: 80% Recycled Polyester, 20% Lycra? F4VF-S1496: Body: 92% Viscose, 8% Spandex Hem Panel: 80% Recycled Polyester, 20% Lycra?. Apparel & Accessories > Clothing > Dresses  Women's Apparel > Women's > Special Sizes > Mid 198.00 USD  198.00 USD      http://www.neimanmarcus.com/en-us/Eileen-Fisher-Sleeveless-Faux-Wrap-Jersey-Dress-Women-s/prod179830418/p.prod      http://images.neimanmarcus.com/product_assets/T/A/6/X/8/NMTA6X8_mz.jpg  http://images.neimanmarcus.com/product_assets/T/A/6/X/8/NMTA6X8_az.jpg  Eileen Fisher   00713259663697  prod179830418       new in stock        prod179830418   BLACK   " 92% Viscose/8% Spandex F4VF-D3502 / D2502X: Body: 92% Viscose, 8% Spandex Hem: 80% Recycled Polyester, 20% Lycra? F4VF-S1496: Body: 92% Viscose, 8% Spandex Hem Panel: 80% Recycled Polyester, 20 Graphic 2X          female  Adult       US::Ground:0.00 USD                                 

在这种情况下,只需调用 line.split('\t') 即可按预期工作,pandas 似乎由于某种原因而中断。

最佳答案

您的数据包含不匹配的引号字符(似乎使用 " 来表示 Height 5'10.5" 等内容中的英寸)。这使得解析器认为存在带引号的字段,但会导致数据损坏,因为引号未配对。

尝试将 quoting=csv.QUOTE_NONE 作为附加参数传递给 read_csv。 (您需要先import csv。或者您可以只传递quoting=3。)

关于python - Pandas 随机插入不存在的分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32552866/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com