4
我试图从下面的段落结构提取这种类型的信息:NLP - 在Python(spaCy)信息提取
women_ran men_ran kids_ran walked
1 2 1 3
2 4 3 1
3 6 5 2
text = ["On Tuesday, one women ran on the street while 2 men ran and 1 child ran on the sidewalk. Also, there were 3 people walking.", "One person was walking yesterday, but there were 2 women running as well as 4 men and 3 kids running.", "The other day, there were three women running and also 6 men and 5 kids running on the sidewalk. Also, there were 2 people walking in the park."]
我使用Python的spaCy
我的NLP图书馆。我更新NLP的工作,并希望得到一些指导,以便从这些句子中提取这些表格信息的最佳方式是什么。
如果仅仅是确定是否有个人跑步或行走,我只是使用sklearn
来适应分类模型,但我需要提取的信息显然比这更细化(我试图检索每个子类别和值)。任何指导将不胜感激。
我没写过一个XPath查询或DOM选择。你能解释一下吗? – kathystehl
@kathystehl XPath指定XML(HTML)文档中的位置。所以XPath查询是一种在XML或HTML中查找特定元素的方法。参见[wikipedia](https://en.wikipedia.org/wiki/XPath)。 DOM选择器是HTML文档中的任何CSS元素'id'或'class'(DOM是您在javascript中使用的HTML/XML文档/树的数据结构等)。所以你可以通过id和class来筛选元素。在NLP中,依赖关系解析器将非结构化文本转换为类似于HTML的树数据结构,其中的标记可以像DOM选择器过滤器和XPath查询一样进行查询。 – hobs