2017-05-08 51 views
0

我想用python和nltk解析一些描述药物处方的医生笔记。我正在寻找一种方法来确定#项目的数值和项目的拍摄频率。寻找算法来解析从EHR的药物注意事项

1 TABLET DAILY 
TAKE 1 TABLET DAILY 
ONE TABLET TWICE DAILY 
2 DAILY 
TWO TABLETS DAILY 
ONE PILL AT BEDTIME 
1/2 PILL TWICE DAILY 
ROLLING WALKER WITH SEAT ATTACHMENT AND HAND BRAKES 
ONE PILL DAILY 
1 TAB PO DAILY 
ONE PILL TWICE A DAY WITH MEALS AS NEEDED 
1 TABLET TWICE DAILY 
300 MG BID 
ONE DAILY 
1 TABLET 3 TIMES DAILY AS NEEDED 
1 DAILY 
TAKE 1 CAPSULE BY MOUTH 4 (FOUR) TIMES A DAY. 
1 TABLET EVERY 4 TO 6 HOURS AS NEEDED 
1 TABLET BY MOUTH TWICE DAILY 
INJECT 34 U TWICE A DAY 

有什么建议吗?

+1

这可能会帮助你沿着正确的道路:http://stackoverflow.com/questions/33337410/nltk-reading-in -word-numbers-to-float-numbers – tatlar

+1

你也可以看看这个项目,我无法获得Earley解析器python代码运行,但作者似乎一直在研究同样的问题。 http://www.mit.edu/~6.863/spring2009/projects/project16.html – griffinc

回答

0

通常有multiple variations其中这些是由医生在临床笔记中写的。 对于如:

1 TABLET DAILY 

也可以。如果你正在寻找一个快速解决书写正则表达式的Python脚本可能会帮助写成

1 tab qid 

。但如果你想要更长远的东西,你可以看看数据和提交i2b2 Medication Information Extraction Challenge