gpt4 book ai didi

python - 如何使用 Python 将这个解析后的 XML 文档有效地存储在 MySQL 数据库中?

转载 作者:可可西里 更新时间:2023-11-01 06:50:27 25 4
gpt4 key购买 nike

以下是 XML 文件:book.xml

<?xml version="1.0" ?>
<!--Sample XML Document-->
<bookstore>
<book _id="E7854">
<title>
Sample XML Book
</title>
<author>
<name _id="AU363">
<first>
Benjamin
</first>

<last>
Smith
</last>
</name>
<affiliation>
A
</affiliation>
</author>
<chapter number="1">
<title>
First Chapter
</title>
<para>
B
<count>
783
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
B
<count>
59
</count>
.
</para>
</chapter>
</book>
<book _id="C843">
<title>
XML Master
</title>
<author>
<name _id="AU245">
<first>
John
</first>

<last>
Doe
</last>
</name>
<affiliation>
C
</affiliation>
</author>
<chapter number="2">
<title>
Second Chapter
</title>
<para>
K
<count>
54
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
K
<count>
328
</count>
.
</para>
</chapter>
<chapter number="7">
<title>
Seventh Chapter
</title>
<para>
K
<count>
265
</count>
.
</para>
</chapter>
<chapter number="9">
<title>
Ninth Chapter
</title>
<para>
K
<count>
356
</count>
.
</para>
</chapter>
</book>
</bookstore>

Python代码如下:book_dom.py

from xml.dom import minidom, Node
import re, textwrap

class SampleScanner:
def __init__(self, doc):
for child in doc.childNodes:
if child.nodeType == Node.ELEMENT_NODE and child.tagName == 'bookstore':
self.handleBookStore(child)

def gettext(self, nodelist):
retlist = []
for node in nodelist:
if node.nodeType == Node.TEXT_NODE:
retlist.append(node.wholeText)
elif node.hasChildNodes:
retlist.append(self.gettext(node.childNodes))

return re.sub('\s+', ' ', ''.join(retlist))

def handleBookStore(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'book':
self.handleBook(child)

def handleBook(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'title':
print "Book title is:", self.gettext(child.childNodes)
if child.tagName == 'author':
self.handleAuthor(child)
if child.tagName == 'chapter':
self.handleChapter(child)

def handleAuthor(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'name':
self.handleAuthorName(child)
elif child.tagName == 'affiliation':
print "Author affiliation:", self.gettext([child])

def handleAuthorName(self, node):
surname = self.gettext(node.getElementsByTagName("last"))
givenname = self.gettext(node.getElementsByTagName("first"))
print "Author Name: %s, %s" % (surname, givenname)

def handleChapter(self, node):
print " *** Start of Chapter %s: %s" % (node.getAttribute('number'),
self.gettext(node.getElementsByTagName('title')))
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'para':
self.handlePara(child)

def handlePara(self, node):
partext = self.gettext([node])
partext = textwrap.fill(partext)
print partext
print

doc = minidom.parse('book.xml')
SampleScanner(doc)

输出:~/$ python book_dom.py

Book ID :  E7854
Book title is: Sample XML Book
Name ID : AU363
Author Name: Smith , Benjamin
Author affiliation: A
*** Start of Chapter 1: First Chapter
B 783 .

*** Start of Chapter 3: Third Chapter
B 59 .

Book ID : C843
Book title is: XML Master
Name ID : AU245
Author Name: Doe , John
Author affiliation: C
*** Start of Chapter 2: Second Chapter
K 54 .

*** Start of Chapter 3: Third Chapter
K 328 .

*** Start of Chapter 7: Seventh Chapter
K 265 .

*** Start of Chapter 9: Ninth Chapter
K 356 .

我的目标是将书籍存储在“书籍”表中,将作者信息存储在“作者”表中(保留书籍 -> 作者关系)[MySQL DB]。

**Book table :**
id |title
E7854 Sample XML Book
....

**Chapter table :**
book_id|chapter_number|title |para
E7854 1 First Chapter B 783 .
E7854 3 Third Chapter B 59 .
....

**Author table :**
id |book_id |name |Affiliation
AU363 E7854 Smith Benjamin A
....

如果我有几千本书和作者(和章节),我该如何将数据存储在数据库中?我在为每本书/作者唯一标识数据集时遇到了问题。我可以使用 ID 并将它们传递给函数以保留关系,但我不确定这是否是最好的方法。非常感谢任何指点。

p.s :我正在研究脚本的 SQL 部分,一旦我测试它就会更新。随意张贴您的想法,代码示例。谢谢!

最佳答案

根据您上面的评论,我会简单地创建一个图书类、一个作者类、一个作者列表和一个章节类。将书的章节分配给 Book 本身的 Chapter 对象列表。将 AuthorList 维护为其 ID 的字典,指向实际的 Author 对象。使用 Book 对象的数据成员来包含 ID;为方便起见,您可以提供一种将作者从 AuthorList 字典中拉出的方法。

关于python - 如何使用 Python 将这个解析后的 XML 文档有效地存储在 MySQL 数据库中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7327924/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com