2014-06-30 58 views
0

我想将一个句子拆分为一个段落,并且每个段落的单词数量应该少于几个。例如:根据单词数量将句子拆分为段数

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. 

Paragraph 1: 
Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. 

Paragraph 2: 
Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. 

在上述例子中,词语小于20处于第1段和其余的是在第2段

有没有任何方法来实现这一使用PHP?

我试过$abc = explode(' ', $str, 20);这将存储数组中的20个单词,然后其余的最后一个数组$ abc ['21']。我如何从前20个数组中提取数据作为第一段,然后将其余数据作为第二段?

+0

你的最后一段 '我已经试过......' 是完全错误的,请重新改写它。 – Athafoud

+0

您可以尝试将字符串转换为数组,然后将前20个字符存储在一个字符串中,其余字符存储在另一个字符串中。 – Aradhna

+0

在炸开句子之后,只需使用'implode'即可。 http://stackoverflow.com/questions/5956610/how-to-select-first-10-words-of-a-sentence – TribalChief

回答

0

首先将字符串拆分成句子。然后循环结束语句数组,首先将句子添加到段落数组中,然后计算段数组元素中的单词,如果大于19个递增段落计数器。

$string = 'Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.'; 

$sentences = preg_split('/(?<=[.?!;])\s+(?=\p{Lu})/', $string); 

$ii = 0; 
$paragraphs = array(); 
foreach ($sentences as $value) { 
    if (isset($paragraphs[$ii])) { $paragraphs[$ii] .= $value; } 
    else { $paragraphs[$ii] = $value; } 
    if (19 < str_word_count($paragraphs[$ii])) { 
     $ii++; 
    } 
} 
print_r($paragraphs); 

输出:这里找到

Array 
(
    [0] => Contrary to popular belief, Lorem Ipsum is not simply random text.It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. 
    [1] => Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. 
) 

句子分配器:Splitting paragraphs into sentences with regexp and PHP