2014-10-22 75 views


语法: -

S -> NP 
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM 
PP -> P NP 
D[NUM=sg] -> 'a' 
D -> 'the' 
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
N[NUM=pl] -> 'dogs'|'cats' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 
NOM -> A NOM|N[NUM=?n] 

代码: -

import nltk 

grammar = nltk.data.load('file:english_grammer.cfg') 
rdparser = nltk.RecursiveDescentParser(grammar) 
sent = "a dogs".split() 
trees = rdparser.parse(sent) 

for tree in trees: print (tree) 

错误: -

ValueError异常:预期的非终结,发现:[NUM =? NUM =Δn] N [NUM =Δn] | D [NUM =Δn] AN [NUM =Δn] | D [NUM =Δn] N [NUM =Δn] PP | QP N [NUM =Δn] AN [NUM =?n] | D [NUM =?n] NOM PP | D [NUM =?n] NOM


请同时发布代码中的完整错误追溯。 – alvas 2014-10-22 13:54:13



我不认为NLTK CFG语法读者可以用方括号读取CFG的格式。


from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
PP -> P NP 
D -> 'the' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 

grammar = CFG.fromstring(grammar_string) 
print grammar 


Grammar with 18 productions (start state = S) 
    S -> NP 
    PP -> P NP 
    D -> 'the' 
    PN -> 'saumya' 
    PN -> 'dinesh' 
    PRO -> 'she' 
    PRO -> 'he' 
    PRO -> 'we' 
    A -> 'tall' 
    A -> 'naughty' 
    A -> 'long' 
    A -> 'three' 
    A -> 'black' 
    P -> 'with' 
    P -> 'in' 
    P -> 'from' 
    P -> 'at' 
    QP -> 'some' 


from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
PP -> P NP 
D -> 'the' 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
N[NUM=pl] -> 'dogs'|'cats' 

grammar = CFG.fromstring(grammar_string) 
print grammar 


Traceback (most recent call last): 
    File "test.py", line 33, in <module> 
    grammar = CFG.fromstring(grammar_string) 
    File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring 
    File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar 
    (linenum+1, line, e)) 
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair' 
Expected an arrow 


  • 使用强调了contrainted非终端和
  • 做出了unconstrainted非终端


from nltk.parse import RecursiveDescentParser 
from nltk.grammar import CFG 

grammar_string = ''' 
S -> NP 
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM 

PP -> P NP 
PN -> 'saumya'|'dinesh' 
PRO -> 'she'|'he'|'we' 
A -> 'tall'|'naughty'|'long'|'three'|'black' 
P -> 'with'|'in'|'from'|'at' 
QP -> 'some' 

D -> D_def | D_sg 
D_def -> 'the' 
D_sg -> 'a' 

N -> N_sg | N_pl 
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair' 
N_pl -> 'dogs'|'cats' 

grammar = CFG.fromstring(grammar_string) 

rdparser = RecursiveDescentParser(grammar) 
sent = "a dogs".split() 
trees = rdparser.parse(sent) 

for tree in trees: 
    print (tree) 


(S (NP (D (D_sg a)) (N (N_pl dogs)))) 

感谢您的回应。其实我想要的是从我的语法中排除以下不合格的句子。 (i)。狗 (ii)。三个女孩 (iii)。他的猫 – 2014-10-22 17:35:29


该帖子解决了您发布的错误。所以我想剩下的就是你的家庭作业了,你可以做到这一点,牢记在非终端中不允许使用括号,并且在NLTK API的cfg中没有限制。玩的开心! – alvas 2014-10-22 19:33:18


感谢您的更新。我会试一试.. – 2014-10-23 10:55:19


它看起来就像你试图用NLTK的功能语法,这确实使用了方括号的语法来表示的特性和功能的协议。 NLTK使用特征语法的解析器是FeatureEarleyChartParser(与RecursiveDescentParser相反)。

NLTK documentation

>>> from __future__ import print_function 
>>> import nltk 
>>> from nltk import grammar, parse 
>>> g = """ 
... % start DP 
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a] 
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that' 
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those' 
... D[AGR=[NUM='pl', PERS=1]] -> 'we' 
... D[AGR=[PERS=2]] -> 'you' 
... N[AGR=[NUM='sg', GND='m']] -> 'boy' 
... N[AGR=[NUM='pl', GND='m']] -> 'boys' 
... N[AGR=[NUM='sg', GND='f']] -> 'girl' 
... N[AGR=[NUM='pl', GND='f']] -> 'girls' 
... N[AGR=[NUM='sg']] -> 'student' 
... N[AGR=[NUM='pl']] -> 'students' 
... """ 
>>> grammar = grammar.FeatureGrammar.fromstring(g) 
>>> tokens = 'these girls'.split() 
>>> parser = parse.FeatureEarleyChartParser(grammar) 
>>> trees = parser.parse(tokens) 
>>> for tree in trees: print(tree) 
(DP[AGR=[GND='f', NUM='pl', PERS=3]] 
    (D[AGR=[NUM='pl', PERS=3]] these) 
    (N[AGR=[GND='f', NUM='pl']] girls)) 

感谢您的建议。我设法通过将grammar = nltk.data.load('file:english_grammer.cfg')更改为grammar = nltk.data.load('file:english_grammer.fcfg')来解决此问题。 – 2015-03-03 09:33:05



例如:english_grammer。 fcfg


import nltk 
from nltk import load_parser 
chart = load_parser('file:english_grammer.fcfg') 
sent = 'the girl gave the dog a bone'.split() 
trees = chart.nbest_parse(sent) 
for tree in trees: print tree 
