我还没有测试过这段代码,但我认为这个非正则表达式的想法可能对你更好。基本上你用空格分割字符串,然后解析每一块。这种方法意味着零件的订单无关紧要。
由于内容和项目可以跨越多个部分,但我认为我的代码应该可以处理该问题,所以它有点棘手。它还假定您每个推文只有一个hashtag,用户,项目和优先级。例如,如果会有多个hashtags,只需将它们放入一个数组而不是一个字符串。最后,它没有任何错误处理来检测/防止奇怪的事情发生。
这里是我的未经测试的代码:
$data = array(
'hash' => '',
'user' => '',
'priority' => '',
'project' => '',
'content' => ''
);
$parsingProjectName = false;
foreach(explode(' ', $tweet) as $piece)
{
switch(substr($piece, 0, 1))
{
case '#':
$data['hash'] = substr($piece, 1);
break;
case '@':
$data['user'] = substr($piece, 1);
break;
case '!':
$data['priority'] = substr($piece, 1);
break;
case '[':
// Check if the project name is longer than 1 word
if(strpos($piece, -1) == ']')
{
$data['project'] = substr($piece, 1, -1);
}
else
{
// There will be more to parse in the next piece(s)
$parsingProjectName = true;
$data['project'] = substr($piece, 1) . ' ';
}
break;
default:
if($parsingProjectName)
{
// Are we at the end yet?
if(strpos($piece, -1) == ']')
{
// Yes we are
$data['project'] .= substr($piece, 1, -1);
$parsingProjectName = false;
}
else
{
// Nope, there is more
$data['project'] .= substr($piece, 1) . ' ';
}
}
else
{
// We aren't in the middle of parsing the project name, and this piece doesn't start with one of the special chars, so assume it is content
$data['content'] .= $piece . ' ';
}
}
}
// There will be an extra space on the end; remove it
$data['content'] = substr($data['content'], 0, -1);
你觉得'\ w'做什么?它与'[a-zA-Z]'几乎相同' – Vyktor 2012-03-03 21:30:32
只需循环遍历所有匹配,然后在每个不以#,@,!开始的匹配中组成一个字符串。 &[ – Yaniro 2012-03-03 21:44:47