将文本段落拆分成句子

-1

我想分割一个文本文件。它来作为一个大段落。我想把它分成更小的句子，每个句子都是一个列表。从那里我可以找出哪些列表包含特定的单词。将文本段落拆分成句子

这是我的代码，因为它是目前：

import string 

Done = False 
while not Done: 
    try: 
     File = input("Enter your file: ") 
     Open_File = open(File, "r") 
     Info = Open_File.readline() 
     print(Info) 
     Open_File.close() 
     Done = True 
    except FileNotFoundError: 
     print("Sorry that file doesn't exist!") 


Info_Str = str(Info) 
Info_Str = Info_Str.lower() 
Info_Str = Info_Str.replace("'", "") 
Info_Str = Info_Str.replace("-", "") 
Info_Str = Info_Str.split() 
Info_List = Info_Str 
Info_List = [''.join(c for c in s if c not in string.punctuation) for s in Info_List] 
New_List = [item for item in Info_List if not item.isdigit()] 
for word in New_List[:]: 
    if len(word) < 3: 
     New_List.remove(word) 
print(New_List)

如果我把一个文本文件，它只返回一个文本文件的第一行字的列表。

如何将每个单独的句子转换为单独的单词列表？提前致谢。

来源

2017-04-10 Amaranthus

您确切的要求是什么？如果您只想获取文件中的单词列表，则可以只读取所有行并使用空格分隔符进行分隔。 – Geetanjali

我基本上必须找出哪个行号出现一个特定的单词。每一行都是一个单独的句子。 – Amaranthus

检查我发布的代码段。这应该有所帮助。 – Geetanjali

你写的代码有点大。您可以使用较少数量的代码行来完成此任务。让我们先来看看我们如何实现它：

使用with声明打开文件。 with声明的好处你不必明确关闭文件。
该段落可以使用“。”分割为一行。要么 ”？”。
每行可以使用单个空格拆分成列表。
然后，您可以在该列表中搜索您想要的单词。

代码：

#open File 
with open("a.txt") as fh: 
    for line in fh: 
     #Split Paragraph on basis of '.' or ? or !. 

     for l in re.split(r"\.|\?|\!",line): 
      #Split line into list using space. 
      tmp_list = l.split(" ") 
      #Search word and if found print that line 
      if "Dinesh" in tmp_list: 
       print l

注：我的代码还可以优化。我想，既然你刚刚开始，这对你有好处。

来源

2017-04-10 05:20:26

我接受了一个重击，然后我意识到：并非所有的句子都必然结束（？，！等）。我认为导致“它只返回文本文件的第一行作为单词列表”的原始错误。错误是这一行：'Info = Open_File.readline（）' – JacobIRR

在你的情况下，每行不是用'。'分隔的行。假设我有 'Hello.new line \ n 同一行.' '新行'和'同一行'将出现在不同的列表中。 – Geetanjali

我试着用'Info = Open_File.read（）'来代替它，但它只是将整个段落作为一个大单词列表返回，而不是在每个新句子处将其分开。 – Amaranthus

这将打印句子编号（0索引）。

with open("sample.txt") as f: 
    content = f.read() # Read the whole file 
    lines = content.split('.') # a list of all sentences 
    for num,line in enumerate(lines): # for each sentence 
      if 'word' in line: 
       print(num) 
      else: 
       print("Not present")

来源

2017-04-10 05:39:42 Geetanjali

将文本段落拆分成句子

回答

相关问题