gpt4 book ai didi

python - 从 docx 文件中提取 python 代码块并在沙箱中运行它们的安全方法是什么?

转载 作者:太空宇宙 更新时间:2023-11-03 15:36:43 25 4
gpt4 key购买 nike

我有大约 6000~6500 个 Microsoft Word .docx 文件,其中包含各种类型的格式化答案脚本,按顺序排列:

Python Programming Question in Bold

Answer in form of complete, correctly-indented, single-spaced, self-sufficient code

不幸的是,似乎没有固定的模式来区分代码块和普通文本。前 50 个左右文件中的一些示例:

  1. 整个问题以粗体显示,之后代码突然开始,在粗体/斜体

  2. 问题放在注释中,然后代码继续

  3. 问题完全缺失,只有带有指示开始的编号列表的代码

  4. 问题完全缺失,带有指示开始的 C/Python 风格注释

等等

目前,我正在通过 python-docx like this: 提取整个未格式化的文本。

doc = Document(infil)

# For Unicode handling.
new_paragraphs = []
for paragraph in doc.paragraphs:
new_paragraphs.append((paragraph.text).encode("utf-8"))

new_paragraphs = list(map(lambda x: convert(x), new_paragraphs))

with open(outfil, 'w', encoding='utf-8') as f:
print('\n'.join(new_paragraphs), file=f)

提取后,我将使用 PyPy Sandboxing feature 运行它们我认为这是安全的,然后像在比赛中一样分配分数。

我完全困惑的是如何以编程方式检测代码的开始和结束。大多数语言检测 API 都是不需要的,因为我已经了解该语言。这个问题:How to detect source code in a text?建议使用 linter 和语法荧光笔,例如 Google Code Prettifier ,但它们没有解决检测单独程序的问题。

合适的解决方案,from this programmers.se question ,似乎是在训练马尔可夫链,但在开始如此庞大的项目之前,我需要一些第二意见。

此提取码也将在评估后提供给所有学生。

如果问题太宽泛或答案太明显,我深表歉意。

最佳答案

嗯,所以您正在寻找某种格式化模式?这对我来说听起来有点奇怪。是否有任何类型的文本或字符串模式可供您利用?我不确定这是否有帮助,但下面的 VBA 脚本会搜索文件夹中的所有 Word 文档,并在与您在 Row1 中指定的搜索条件匹配的任何字段中放置“X”。它还在 ColA 中添加了一个超链接,因此您可以单击该链接并打开该文件,而无需到处搜索该文件。这是屏幕截图。

enter image description here

脚本:

Sub OpenAndReadWordDoc()

Rows("2:1000000").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
Range("A1").Select

' assumes that the previous procedure has been executed
Dim oWordApp As Word.Application
Dim oWordDoc As Word.Document
Dim blnStart As Boolean
Dim r As Long
Dim sFolder As String
Dim strFilePattern As String
Dim strFileName As String
Dim sFileName As String
Dim ws As Worksheet
Dim c As Long
Dim n As Long

'~~> Establish an Word application object
On Error Resume Next
Set oWordApp = GetObject(, "Word.Application")
If Err() Then
Set oWordApp = CreateObject("Word.Application")
' We started Word for this macro
blnStart = True
End If
On Error GoTo ErrHandler

Set ws = ActiveSheet
r = 1 ' startrow for the copied text from the Word document
' Last column
n = ws.Range("A1").End(xlToRight).Column

sFolder = "C:\Users\your_path_here\"

'~~> This is the extension you want to go in for
strFilePattern = "*.doc*"
'~~> Loop through the folder to get the word files
strFileName = Dir(sFolder & strFilePattern)
Do Until strFileName = ""
sFileName = sFolder & strFileName

'~~> Open the word doc
Set oWordDoc = oWordApp.Documents.Open(sFileName)
' Increase row number
r = r + 1
' Enter file name in column A
ws.Cells(r, 1).Value = sFileName

ActiveCell.Offset(1, 0).Select
ActiveSheet.Hyperlinks.Add Anchor:=Sheets("Sheet1").Range("A" & r), Address:=sFileName,
SubAddress:="A" & r, TextToDisplay:=sFileName

' Loop through the columns
For c = 2 To n
If oWordDoc.Content.Find.Execute(FindText:=Trim(ws.Cells(1, c).Value),
MatchWholeWord:=True, MatchCase:=False) Then
' If text found, enter Yes in column number c
ws.Cells(r, c).Value = "Yes"
End If
Next c
oWordDoc.Close SaveChanges:=False

'~~> Find next file
strFileName = Dir()
Loop

ExitHandler:
On Error Resume Next
' close the Word application
Set oWordDoc = Nothing
If blnStart Then
' We started Word, so we close it
oWordApp.Quit
End If
Set oWordApp = Nothing
Exit Sub

ErrHandler:
MsgBox Err.Description, vbExclamation
Resume ExitHandler
End Sub

Function GetDirectory(path)
GetDirectory = Left(path, InStrRev(path, "\"))
End Function

关于python - 从 docx 文件中提取 python 代码块并在沙箱中运行它们的安全方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42468134/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com