gpt4 book ai didi

python - 以编程方式将 Pandas 数据框转换为 Markdown 表

转载 作者:IT老高 更新时间:2023-10-28 22:02:08 27 4
gpt4 key购买 nike

我有一个从数据库生成的 Pandas Dataframe,其中包含混合编码的数据。例如:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

正如所见,英语和挪威语混合在一起(我认为在数据库中编码为 ISO-8859-1)。我需要将此 Dataframe 输出的内容作为 Markdown 表获取,但不会出现编码问题。我关注了this answer (来自问题 Generate Markdown tables? )并得到以下信息:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
items = (row['date'],
row['path'],
row['language'],
row['shortest_sentence'],
row['longest_sentence'],
row['number_words'],
row['readability_consensus'])
rows.append(items)

headings = ['Date',
'Path',
'Language',
'Shortest Sentence',
'Longest Sentence since',
'Words',
'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

但是,这会产生 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) 错误。如何将 Dataframe 输出为 Markdown 表?也就是说,为了将此代码存储在文件中以用于编写 Markdown 文档。我需要输出如下所示:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade |
| 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |

最佳答案

Pandas 1.0 于 2020 年 1 月 29 日发布,支持 markdown 转换,现在可以直接做啦!

示例取自 docs :

df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
|    |   A |   B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |

或者没有索引:

print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1
|   A |   B |
|----:|----:|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |

关于python - 以编程方式将 Pandas 数据框转换为 Markdown 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33181846/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com