2016-01-13 72 views
1

我有文本的语料库,这与\nn分成段落。如何用Python`string.find()`找到段落的边界?

\n\n"Well done, Mrs. Martin!" thought Emma. "You know what you are about."\n\n"And when she had come away, Mrs. Martin was so very kind as to send\nMrs. Goddard a beautiful goose--the finest goose Mrs. Goddard had\never seen. Mrs. Goddard had dressed it on a Sunday, and asked all\nthe three teachers, Miss Nash, and Miss Prince, and Miss Richardson,\nto sup with her."\n\n"Mr. Martin, I suppose, is not a man of information beyond the line\nof his own business? He does not read?"\n\n"Oh yes!--that is, no--I do not know--but I believe he has\nread a good deal--but not what you would think any thing of.\nHe reads the Agricultural Reports, and some other books that lay\nin one of the window seats--but he reads all _them_ to himself.\nBut sometimes of an evening, before we went to cards, he would read\nsomething aloud out of the Elegant Extracts, very entertaining.\nAnd I know he has read the Vicar of Wakefield. He never read the\nRomance of the Forest, nor The Children of the Abbey. He had never\nheard of such books before I mentioned them, but he is determined\nto get them now as soon as ever he can."\n\nThe next question was--\n\n"What sort of looking man is Mr. Martin?" 

或如果打印,

"Well done, Mrs. Martin!" thought Emma. "You know what you are about." 

"And when she had come away, Mrs. Martin was so very kind as to send 
Mrs. Goddard a beautiful goose--the finest goose Mrs. Goddard had 
ever seen. Mrs. Goddard had dressed it on a Sunday, and asked all 
the three teachers, Miss Nash, and Miss Prince, and Miss Richardson, 
to sup with her." 

"Mr. Martin, I suppose, is not a man of information beyond the line 
of his own business? He does not read?" 

"Oh yes!--that is, no--I do not know--but I believe he has 
read a good deal--but not what you would think any thing of. 
He reads the Agricultural Reports, and some other books that lay 
in one of the window seats--but he reads all _them_ to himself. 
But sometimes of an evening, before we went to cards, he would read 
something aloud out of the Elegant Extracts, very entertaining. 
And I know he has read the Vicar of Wakefield. He never read the 
Romance of the Forest, nor The Children of the Abbey. He had never 
heard of such books before I mentioned them, but he is determined 
to get them now as soon as ever he can." 

The next question was-- 

"What sort of looking man is Mr. Martin?" 

给定某个段落,我想知道在哪里段落的边界。也就是说,我想通过换行符\n\n找到段落的位置。

我的目标是我的光标点击某个段落,我就知道这一段的基础上,\n\n位置的边界。

import string 
string.find("\n\n") 

将输出空格在字符串内的位置。但是某个段落呢?如果我在第四段(在Vicar of Wakefield)“点击”,我怎么能搜索第一\n\n高于此,搜索第一\n\n低于这个?

回答

1

假设您知道位置pos您在长文本字符串中“点击”的位置,那么您可以使用str.findstr.rfind()来解决您的问题。

为了看 “着” 你会做:

string.find("\n\n", pos) # searches for "\n\n" starting from position `pos`, returning the first match 

和 “落后” 你会做:

string.rfind("\n\n", 0, pos) # searches for "\n\n" from the beginning up-to `pos` but will return you the last match 

对于文档的两种方法看https://docs.python.org/2/library/string.html

+0

任何想法找出一个“点击”的位置? – EB2127

+0

这要求你给你更多的“系统”上下文。你如何显示该段落?你如何处理输入设备? 我会后,作为另一个问题。 – sal

+0

我将使用TkInter作为交互式GUI。基本上,通过文本小工具输入文本,并允许用户“点击”一个段落。 – EB2127