gpt4 book ai didi

Python iterparse 正在跳过值

转载 作者:行者123 更新时间:2023-12-01 08:35:55 26 4
gpt4 key购买 nike

我使用 iterparse 来解析一个大的 xml 文件 (1,8 GB)。我将所有数据写入 csv 文件。t 我制作的脚本运行良好,但由于某种原因它会随机跳行。这是我的脚本:

import xml.etree.cElementTree as ET
import csv
xml_data_to_csv =open('Out2.csv','w', newline='', encoding='utf8')
Csv_writer=csv.writer(xml_data_to_csv, delimiter=';')

file_path = "Products_50_producten.xml"
context = ET.iterparse(file_path, events=("start", "end"))

EcommerceProductGuid = ""
ProductNumber = ""
Description = ""
ShopSalesPriceInc = ""
Barcode = ""
AvailabilityStatus = ""
Brand = ""
# turn it into an iterator
#context = iter(context)
product_tag = False
for event, elem in context:
tag = elem.tag

if event == 'start' :
if tag == "Product" :
product_tag = True

elif tag == 'EcommerceProductGuid' :
EcommerceProductGuid = elem.text

elif tag == 'ProductNumber' :
ProductNumber = elem.text

elif tag == 'Description' :
Description = elem.text

elif tag == 'SalesPriceInc' :
ShopSalesPriceInc = elem.text

elif tag == 'Barcode' :
Barcode = elem.text

elif tag == 'AvailabilityStatus' :
AvailabilityStatus = elem.text


elif tag == 'Brand' :
Brand = elem.text

if event == 'end' and tag =='Product' :
product_tag = False
List_nodes = []
List_nodes.append(EcommerceProductGuid)
List_nodes.append(ProductNumber)
List_nodes.append(Description)
List_nodes.append(ShopSalesPriceInc)
List_nodes.append(Barcode)
List_nodes.append(AvailabilityStatus)
List_nodes.append(Brand)
Csv_writer.writerow(List_nodes)
print(EcommerceProductGuid)
List_nodes.clear()
EcommerceProductGuid = ""
ProductNumber = ""
Description = ""
ShopSalesPriceInc = ""
Barcode = ""
AvailabilityStatus = ""
Brand = ""

elem.clear()


xml_data_to_csv.close()

“Products_50_producten.xml”文件具有以下布局:

<?xml version="1.0" encoding="utf-16" ?>
<ProductExport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<ExportInfo>
<ExportDateTime>2018-11-07T00:01:03+01:00</ExportDateTime>
<Type>Incremental</Type>
<ExportStarted>Automatic</ExportStarted>
</ExportInfo>
<Products>
<Product><EcommerceProductGuid>4FB8A271-D33E-4501-9EB4-17CFEBDA4177</EcommerceProductGuid><ProductNumber>982301017</ProductNumber><Description>Ducati Jas Radiaal Zwart Xxl Heren Tekst - 982301017</Description><Brand>DUCATI</Brand><ProductVariations><ProductVariation><SalesPriceInc>302.2338</SalesPriceInc><Barcodes><Barcode BarcodeOrder="1">982301017</Barcode></Barcodes></ProductVariation></ProductVariations></Product>
<Product><EcommerceProductGuid>4FB8A271-D33E-4501-9EB4-17CFEBDA4177</EcommerceProductGuid><ProductNumber>982301017</ProductNumber><Description>Ducati Jas Radiaal Zwart Xxl Heren Tekst - 982301017</Description><Brand>DUCATI</Brand><ProductVariations><ProductVariation><SalesPriceInc>302.2338</SalesPriceInc><Barcodes><Barcode BarcodeOrder="1">982301017</Barcode></Barcodes></ProductVariation></ProductVariations></Product>
</Products>

例如,如果我将“产品”复制 300 次,则会将 csv 文件中第 155 行的“EcommerceProductGuid”值保留为空。如果我复制 Product 400 次,它会在第 155、310 和 368 行留下空值。这怎么可能?

最佳答案

我认为问题出在if event == 'start'

According to other questions/answers ,不保证 text 属性的内容被定义。

不过,似乎并不像改成if event == 'end'那么简单。当我自己尝试时,我得到的空田比有人居住的田更多。 (更新:如果我从 中删除 events=("start", "end"),则使用 event == 'end' 确实有效>iterparse。)

最终的结果是完全忽略该事件并仅测试text是否已填充。

更新了代码...

import xml.etree.cElementTree as ET
import csv

xml_data_to_csv = open('Out2.csv', 'w', newline='', encoding='utf8')
Csv_writer = csv.writer(xml_data_to_csv, delimiter=';')

file_path = "Products_50_producten.xml"
context = ET.iterparse(file_path, events=("start", "end"))

EcommerceProductGuid = ""
ProductNumber = ""
Description = ""
ShopSalesPriceInc = ""
Barcode = ""
AvailabilityStatus = ""
Brand = ""
for event, elem in context:
tag = elem.tag
text = elem.text

if tag == 'EcommerceProductGuid' and text:
EcommerceProductGuid = text

elif tag == 'ProductNumber' and text:
ProductNumber = text

elif tag == 'Description' and text:
Description = text

elif tag == 'SalesPriceInc' and text:
ShopSalesPriceInc = text

elif tag == 'Barcode' and text:
Barcode = text

elif tag == 'AvailabilityStatus' and text:
AvailabilityStatus = text

elif tag == 'Brand' and text:
Brand = text

if event == 'end' and tag == "Product":
product_tag = False
List_nodes = []
List_nodes.append(EcommerceProductGuid)
List_nodes.append(ProductNumber)
List_nodes.append(Description)
List_nodes.append(ShopSalesPriceInc)
List_nodes.append(Barcode)
List_nodes.append(AvailabilityStatus)
List_nodes.append(Brand)
Csv_writer.writerow(List_nodes)
print(EcommerceProductGuid)
List_nodes.clear()
EcommerceProductGuid = ""
ProductNumber = ""
Description = ""
ShopSalesPriceInc = ""
Barcode = ""
AvailabilityStatus = ""
Brand = ""

elem.clear()

xml_data_to_csv.close()

这似乎与我的包含 300 个 Product 元素的测试文件配合得很好。

此外,我认为如果您使用字典和 csv.DictWriter,您可以简化代码。

示例(产生与上面代码相​​同的输出)...

import xml.etree.cElementTree as ET
import csv
from copy import deepcopy

field_names = ['EcommerceProductGuid', 'ProductNumber', 'Description',
'SalesPriceInc', 'Barcode', 'AvailabilityStatus', 'Brand']

values_template = {'EcommerceProductGuid': "",
'ProductNumber': "",
'Description': "",
'SalesPriceInc': "",
'Barcode': "",
'AvailabilityStatus': "",
'Brand': ""}

with open('Out2.csv', 'w', newline='', encoding='utf8') as xml_data_to_csv:

csv_writer = csv.DictWriter(xml_data_to_csv, delimiter=';', fieldnames=field_names)

file_path = "Products_50_producten.xml"
context = ET.iterparse(file_path, events=("start", "end"))

values = deepcopy(values_template)

for event, elem in context:
tag = elem.tag
text = elem.text

if tag in field_names and text:
values[tag] = text

if event == 'end' and tag == "Product":
csv_writer.writerow(values)
print(values.get('EcommerceProductGuid'))
values = deepcopy(values_template)

elem.clear()

关于Python iterparse 正在跳过值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53729583/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com