gpt4 book ai didi

python - 使用 nbconvert 执行包含内联 Markdown 的 Jupyter 笔记本

转载 作者:行者123 更新时间:2023-12-01 00:44:11 32 4
gpt4 key购买 nike

我有一个 Jupyter 笔记本,它在 Markdown 单元格中包含 python 变量,如下所示:

代码单元格:

x = 10

Markdown 单元格:
The value of x is {{x}}.

IPython-notebook-extension Python Markdown如果我在笔记本中使用 shift-enter 执行 Markdown 单元格,则允许我动态显示这些变量。

Markdown 单元格:
The value of x is 10.

我想以编程方式执行笔记本中的所有单元格,并使用以下内容将它们保存到新笔记本中:
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor

with open('report.ipynb') as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
ep.preprocess(nb, {})
with open('report_executed.ipynb', 'wt') as f:
nbformat.write(nb, f)

这将执行代码单元而不是 Markdown 单元。它们仍然是这样的:
The value of x is {{x}}.

我认为问题在于笔记本不受信任。有没有办法告诉 ExecutePreprocessor 信任笔记本?是否有另一种方法以编程方式执行笔记本,包括 Markdown 单元格中的 python 变量?

最佳答案

执行预处理器 only looks at code cells ,因此您的 Markdown 单元格完全没有受到影响。如您所说,要进行 Markdown 处理,您需要 Python Markdown 预处理器。

不幸的是,Python Markdown 预处理器系统只在实时笔记本中执行代码,它是由 modifying the javascript involved with rendering cells 完成的。 .修改将执行代码片段的结果存储在单元元数据中。
PyMarkdownPreprocessor class(在 pre_pymarkdown.py 中)旨在与 nbconvert 在笔记本电脑上运行一起使用,这些笔记本电脑首先在实时笔记本设置中呈现。它处理 Markdown 单元格,替换 {{}}具有存储在元数据中的值的模式。

但是,在您的情况下,您没有实时笔记本元数据。我有一个类似的问题,我通过编写自己的执行预处理器来解决它,该预处理器还包括处理 Markdown 单元的逻辑:

from nbconvert.preprocessors import ExecutePreprocessor, Preprocessor
import nbformat, nbconvert
from textwrap import dedent

class ExecuteCodeMarkdownPreprocessor(ExecutePreprocessor):

def __init__(self, **kw):
self.sections = {'default': True} # maps section ID to true or false
self.EmptyCell = nbformat.v4.nbbase.new_raw_cell("")

return super().__init__(**kw)

def preprocess_cell(self, cell, resources, cell_index):
"""
Executes a single code cell. See base.py for details.
To execute all cells see :meth:`preprocess`.
"""

if cell.cell_type not in ['code','markdown']:
return cell, resources

if cell.cell_type == 'code':
# Do code stuff
return self.preprocess_code_cell(cell, resources, cell_index)

elif cell.cell_type == 'markdown':
# Do markdown stuff
return self.preprocess_markdown_cell(cell, resources, cell_index)
else:
# Don't do anything
return cell, resources

def preprocess_code_cell(self, cell, resources, cell_index):
''' Process code cell.
'''
outputs = self.run_cell(cell)
cell.outputs = outputs

if not self.allow_errors:
for out in outputs:
if out.output_type == 'error':
pattern = u"""\
An error occurred while executing the following cell:
------------------
{cell.source}
------------------
{out.ename}: {out.evalue}
"""
msg = dedent(pattern).format(out=out, cell=cell)
raise nbconvert.preprocessors.execute.CellExecutionError(msg)

return cell, resources

def preprocess_markdown_cell(self, cell, resources, cell_index):
# Find and execute snippets of code
cell['metadata']['variables'] = {}
for m in re.finditer("{{(.*?)}}", cell.source):
# Execute code
fakecell = nbformat.v4.nbbase.new_code_cell(m.group(1))
fakecell, resources = self.preprocess_code_cell(fakecell, resources, cell_index)

# Output found in cell.outputs
# Put output in cell['metadata']['variables']
for output in fakecell.outputs:
html = self.convert_output_to_html(output)
if html is not None:
cell['metadata']['variables'][fakecell.source] = html
break
return cell, resources

def convert_output_to_html(self, output):
'''Convert IOpub output to HTML

See https://github.com/ipython-contrib/IPython-notebook-extensions/blob/master/nbextensions/usability/python-markdown/main.js
'''
if output['output_type'] == 'error':
text = '**' + output.ename + '**: ' + output.evalue;
return text
elif output.output_type == 'execute_result' or output.output_type == 'display_data':
data = output.data
if 'text/latex' in data:
html = data['text/latex']
return html
elif 'image/svg+xml' in data:
# Not supported
#var svg = ul['image/svg+xml'];
#/* embed SVG in an <img> tag, still get eaten by sanitizer... */
#svg = btoa(svg);
#html = '<img src="data:image/svg+xml;base64,' + svg + '"/>';
return None
elif 'image/jpeg' in data:
jpeg = data['image/jpeg']
html = '<img src="data:image/jpeg;base64,' + jpeg + '"/>'
return html
elif 'image/png' in data:
png = data['image/png']
html = '<img src="data:image/png;base64,' + png + '"/>'
return html
elif 'text/markdown' in data:
text = data['text/markdown']
return text
elif 'text/html' in data:
html = data['text/html']
return html
elif 'text/plain' in data:
text = data['text/plain']
# Strip <p> and </p> tags
# Strip quotes
# html.match(/<p>([\s\S]*?)<\/p>/)[1]
text = re.sub(r'<p>([\s\S]*?)<\/p>', r'\1', text)
text = re.sub(r"'([\s\S]*?)'",r'\1', text)
return text
else:
# Some tag we don't support
return None
else:
return None

然后,您可以使用类似于您发布的代码的逻辑处理您的笔记本:
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
import ExecuteCodeMarkdownPreprocessor # from wherever you put it
import PyMarkdownPreprocessor # from pre_pymarkdown.py

with open('report.ipynb') as f:
nb = nbformat.read(f, as_version=4)
ep = ExecuteCodeMarkdownPreprocessor(timeout=600, kernel_name='python3')
ep.preprocess(nb, {})
pymk = PyMarkdownPreprocessor()
pymk.preprocess(nb, {})

with open('report_executed.ipynb', 'wt') as f:
nbformat.write(nb, f)

请注意,通过包含 Python Markdown 预处理,您生成的笔记本文件将不再具有 {{}} Markdown 单元格中的语法 - Markdown 将具有静态内容。如果结果笔记本的接收者更改了代码并再次执行,则 Markdown 将不会更新。但是,如果您要导出为不同的格式(例如 HTML),那么您确实需要替换 {{}}具有静态内容的语法。

关于python - 使用 nbconvert 执行包含内联 Markdown 的 Jupyter 笔记本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35805121/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com