按其类型划分句子（疑问/肯定答案）

我有一个程序，它将文本分为句子，然后将句子分为单词，然后计算语音部分的数量并将数据写入csv文件。问题是这样的：我需要按类别划分句子。在输入上我想要接收一组句子。然后在句子末尾用标点符号标出每个句子，确定其类型。如果这是一个肯定的句子，那么csv中的标志将为零，如果这是一个疑问句，那么标志将是1.我该怎么做？按其类型划分句子（疑问/肯定答案）

这是代码：

# -*- coding: utf-8 -*- 
import json 
import pymorphy2 
import csv 
from nltk.tokenize import sent_tokenize 
from nltk.tokenize import word_tokenize 
import re 

# with open('kuprin.txt', 'r') as myfile: 
#  text = myfile.read().replace('\n', '') 
text="Hi!How are you?My name is Jack.What is your name?" 
sentences = sent_tokenize(text) 
morph = pymorphy2.MorphAnalyzer(); 
s = set(sentences) 

for sentences in s: 
    # print('-'+sentences) 
    words = word_tokenize(sentences) 
    print(words) 

json_data = [] 
i = 0 
for item in s: 
    if item == '': 
     continue 
    word_list = item.split(' ') 
    data = { 
     "id": i, 
     "sentences": item, 
     "ADJF": 0, 
     "NOUN": 0, 
     "INTJ": 0, 
     "ADJS": 0, 
     "COMP": 0, 
     "VERB": 0, 
     "INFN": 0, 
     "PRTF": 0, 
     "PRTS": 0, 
     "GRND": 0, 
     "NUMR": 0, 
     "ADVB": 0, 
     "NPRO": 0, 
     "PRED": 0, 
     "PREP": 0, 
     "CONJ": 0, 
     "PRCL": 0, 
     "FLAG": 0 
    } 

    for word in word_list: 
     res = morph.parse(word) 
     pos = res[0].tag.POS 
     if pos == None: 
      continue 
     print(word + "---" + str(pos)) 
     data[pos] += 1 
    json_data.append(data) 
    i = i+1 

for el in json_data: 
    print(el) 

with open('test.json', 'w') as f: 
    json.dump(json_data, f, ensure_ascii=False, sort_keys=False, indent=4, 
separators=(',', ': ')) 

txt_file = r"test.json" 
csv_file = r"test.csv" 

in_txt = csv.reader(open(txt_file, "rt")) 
out_csv = csv.writer(open(csv_file, 'w')) 

out_csv.writerow(
    ["id", "sentences", "ADJF", "NOUN", "INTJ", "ADJS", "COMP", "VERB", 
    "INFN", "PRTF", "PRTS", "GRND", "NUMR", 
    "ADVB", "NPRO", "PRED", "PREP", "CONJ", "PRCL"]) 

for el in json_data: 
    csv_str =[] 
    for value in el.values(): 
     csv_str += [value] 
    print(csv_str) 
    out_csv.writerow(csv_str)

来源

2017-05-05 Human

定义data后，您可以添加一个简单的检查中的word_list的最后一个字的最后一个字符，并相应修改data["FLAG"]：

... 
     "PRCL": 0, 
     "FLAG": 0 
    } 

    if word_list[-1][-1] == "?": 
     data["FLAG"] = 1 
    else: 
     pass 

    for word in word_list: 
     res = morph.parse(word) 
...

有可能是更健壮的方式来做到这一点，但这似乎很简单，可以做你所需要的。

来源

2017-05-11 18:37:38 Marcy

按其类型划分句子（疑问/肯定答案）

回答

相关问题