2011-01-09 70 views
1

所以,我想代表莎士比亚的戏剧之一,哈姆雷特,为如下的对象(也许这是不是最好的表现,如果是的话请告诉我):翻译发挥在HTML到Python

class Play(): 
    acts = [] 
    ... 
    def add_act(self, act): acts.append(act) 

class Act(): 
    scenes = [] 
    ... 
    def add_scene(self, scene): scenes.append(scene) 

class Scene(): 
    elems = [] 
    def __init__(self, title, setting=""): ... 
    def add_elem(self, elem): elems.append(elem) 
    ... 

class StageDirection(): # elem 
    def __init__(self, text): ... 

class Line(): # elem 
    def __init__(self, id, text, character = None): ... 
    # A None character represents a continuation from the previous line 
    # id could be, for example, 1.1.1 

有其他的方法,当然,对于在每一类的印刷和这样。

的问题是,如何获取的结构基于这些类从HTML 4码(或类似的东西他们),看起来像这样:

<H3>ACT I</h3> 
<h3>SCENE I. Elsinore. A platform before the castle.</h3> 
<p><blockquote> 
<i>FRANCISCO at his post. Enter to him BERNARDO</i> 
</blockquote> 

<A NAME=speech1><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.1>Who's there?</A><br> 
</blockquote> 

<A NAME=speech2><b>FRANCISCO</b></a> 
<blockquote> 
<A NAME=1.1.2>Nay, answer me: stand, and unfold yourself.</A><br> 
</blockquote> 

<A NAME=speech3><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.3>Long live the king!</A><br> 
</blockquote> 

<A NAME=speech4><b>FRANCISCO</b></a> 
<blockquote> 
<A NAME=1.1.4>Bernardo?</A><br> 
</blockquote> 

<A NAME=speech5><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.5>He.</A><br> 
</blockquote> <!-- for more, see the source of shakespeare.mit.edu/hamlet/full.html --> 

翻译说成是这样的:

play = Play() 
actI = Act() 
sceneI = Scene("Scene I", "Elsinore. A platform before the castle.") 
sceneI.add_elem(StageDirection("Francisco at his post. Enter to him Bernardo.")) 
sceneI.add_elem(Line("Bernardo", "Who's there?")) 
... 

当然,我不希望所有的代码,但是,当有没有图书馆,逻辑,我应该用什么库?

谢谢。

(这是一个未来的开源项目,我学习Python的乐趣,不做作业。)

回答

4

使用lxml或类似的解析器。他们将阅读您的HTML(XML?)到一个文档树,这基本上是您已写入的数据结构的更宽泛的版本。

然后,您可以遍历产生的树和修剪,或在内存中,看起来你想要的方式重建另一棵树。但HTML - >数据结构步骤是一个解决的问题。


等待,你想生成实际的Python代码?你为什么要这么做?

class Play(): 
    acts = [] 
    ... 
    def add_act(self, act): acts.append(act) 

试试这个:

+0

不,这只是示例代码,将达到同样的效果:) – 2011-01-09 03:18:22

3

顺便说一句,你希望你的代码不会做

class Play(): 
    def __init__(self): 
    self.acts = [] 
    ... 
    def add_act(self, act): 
    self.acts.append(act)