从docx文件中提取python代码块并在沙箱中运行它们的安全方法是什么？

我有大约6000〜6500 Microsoft Word.docx文件与各类内他们格式化的回答脚本，顺序：从docx文件中提取python代码块并在沙箱中运行它们的安全方法是什么？

Python编程问题粗体部分

抢答齐全，形式法正确缩进，单间隔，自足代码

不幸的是，似乎没有固定模式将代码块与正常文本区分开来。从最初的50名左右的文件的一些例子：

整个问题的大胆，在这之后的代码开始突然，在粗体/斜体
付诸表决，在评论，在这之后的代码会继续
完全缺失的问题，只是带有编号列表的代码表示开始
完全缺失的问题，用C/Python样式注释表示开始

等

现在，我通过python-docx提取整个无格式文本like this:

doc = Document(infil) 

# For Unicode handling. 
new_paragraphs = [] 
for paragraph in doc.paragraphs: 
    new_paragraphs.append((paragraph.text).encode("utf-8")) 

new_paragraphs = list(map(lambda x: convert(x), new_paragraphs)) 

with open(outfil, 'w', encoding='utf-8') as f: 
    print('\n'.join(new_paragraphs), file=f)

提取完毕，我会使用运行它们，我明白了PyPy Sandboxing feature是安全的然后像在比赛中一样分配点数。

我完全坚持的是如何以编程方式检测代码的开始和结束。大多数语言检测API是不需要的，因为我已经知道这种语言。这个问题：How to detect source code in a text?建议使用像Google Code Prettifier这样的短语和语法荧光笔，但它们不能解决检测单独程序的问题。

一个合适的解决方案from this programmers.se question似乎是在训练马尔可夫链，但在开始这么庞大的项目之前，我想要一些其他的意见。

此提取码也将在评估后提供给所有学生。

如果问题太宽泛或答案太明显，我表示歉意。

来源

2017-02-26 RaunakS

Hummm，所以你正在寻找某种格式化模式？这对我来说听起来很奇怪。有什么样的文本或字符串模式可以利用吗？我不确定这是否有帮助，但下面的VBA脚本搜索文件夹中的所有Word文档，并在任何与您在Row1中指定的搜索条件相匹配的字段中输入“X”。它还在ColA中添加了超链接，因此您可以单击链接并打开文件，而不是搜索文件。这是一个屏幕截图。

脚本：

Sub OpenAndReadWordDoc() 

    Rows("2:1000000").Select 
    Range(Selection, Selection.End(xlDown)).Select 
    Selection.ClearContents 
    Range("A1").Select 

    ' assumes that the previous procedure has been executed 
    Dim oWordApp As Word.Application 
    Dim oWordDoc As Word.Document 
    Dim blnStart As Boolean 
    Dim r As Long 
    Dim sFolder As String 
    Dim strFilePattern As String 
    Dim strFileName As String 
    Dim sFileName As String 
    Dim ws As Worksheet 
    Dim c As Long 
    Dim n As Long 

    '~~> Establish an Word application object 
    On Error Resume Next 
    Set oWordApp = GetObject(, "Word.Application") 
    If Err() Then 
     Set oWordApp = CreateObject("Word.Application") 
     ' We started Word for this macro 
     blnStart = True 
    End If 
    On Error GoTo ErrHandler 

    Set ws = ActiveSheet 
    r = 1 ' startrow for the copied text from the Word document 
    ' Last column 
    n = ws.Range("A1").End(xlToRight).Column 

    sFolder = "C:\Users\your_path_here\" 

    '~~> This is the extension you want to go in for 
    strFilePattern = "*.doc*" 
    '~~> Loop through the folder to get the word files 
    strFileName = Dir(sFolder & strFilePattern) 
    Do Until strFileName = "" 
     sFileName = sFolder & strFileName 

     '~~> Open the word doc 
     Set oWordDoc = oWordApp.Documents.Open(sFileName) 
     ' Increase row number 
     r = r + 1 
     ' Enter file name in column A 
     ws.Cells(r, 1).Value = sFileName 

     ActiveCell.Offset(1, 0).Select 
     ActiveSheet.Hyperlinks.Add Anchor:=Sheets("Sheet1").Range("A" & r), Address:=sFileName, 
     SubAddress:="A" & r, TextToDisplay:=sFileName 

     ' Loop through the columns 
     For c = 2 To n 
      If oWordDoc.Content.Find.Execute(FindText:=Trim(ws.Cells(1, c).Value), 
        MatchWholeWord:=True, MatchCase:=False) Then 
       ' If text found, enter Yes in column number c 
       ws.Cells(r, c).Value = "Yes" 
      End If 
     Next c 
     oWordDoc.Close SaveChanges:=False 

     '~~> Find next file 
     strFileName = Dir() 
    Loop 

ExitHandler: 
    On Error Resume Next 
    ' close the Word application 
    Set oWordDoc = Nothing 
    If blnStart Then 
     ' We started Word, so we close it 
     oWordApp.Quit 
    End If 
    Set oWordApp = Nothing 
    Exit Sub 

ErrHandler: 
    MsgBox Err.Description, vbExclamation 
    Resume ExitHandler 
End Sub 

Function GetDirectory(path) 
    GetDirectory = Left(path, InStrRev(path, "\")) 
End Function

来源

2017-03-13 03:31:32 ryguy72

从docx文件中提取python代码块并在沙箱中运行它们的安全方法是什么？

回答

相关问题