2012-08-07 37 views
0

我抓取页面,此代码:Symfony 2 DOM爬虫。以文本没有标签

<br/> 

<td class="PropertyBody"> 
<b>Category:</b> 
Miscellanea: Soft Skill 
<br> 
<b>Owner:</b> 
<a href="mailto:">blabla</a> 
<br> 
<b>Location:</b> 
bla bla 
<br> 
<b>Duration:</b> 
6:00 
<br> 
<b>Max attendees:</b> 
15 
<br> 
<b>Start at:</b> 
7/19/2012 10:00:00 AM 
<br> 
<b>Your status:</b> 
<br> 
</td> 

我怎样才能提取,例如'7/19/2012 10:00:00 AM'从这个代码与Symfony的履带? $crawler->filter('.PropertyBody > b')->eq(5)->text();只取'Start at:'

谢谢,我已经做到了:

$bigPiece = $crawler->filter('.PropertyBody')->text(); 
     //getting CATEGORY   
     $pos = strpos($bigPiece, ':')+1; 
     $pos2 = strpos($bigPiece, 'Owner:'); 
     $category = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $this->category = $category; 
     //getting OWNER 
     $pos = strpos($bigPiece, 'Owner:')+6; 
     $pos2 = strpos($bigPiece, 'Location:'); 
     $owner = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $training->setOwner($owner); 
     //getting LOCATION 
     $pos = strpos($bigPiece, 'Location:')+9; 
     $pos2 = strpos($bigPiece, 'Duration:'); 
     $location = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $training->setLocation($location); 
     //getting DURATION 
     $pos = strpos($bigPiece, 'Duration:')+9; 
     $pos2 = strpos($bigPiece, 'Max attendees:'); 
     $duration = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $training->setDuration($duration); 
     //getting MAXATTENDEES 
     $pos = strpos($bigPiece, 'Max attendees:')+14; 
     $pos2 = strpos($bigPiece, 'Start at:'); 
     $maxattendees = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $training->setMaxattendies($maxattendees); 
     //getting START AT 
     $pos = strpos($bigPiece, 'Start at:')+9; 
     $pos2 = strpos($bigPiece, 'Your status:'); 
     $start = trim(substr($bigPiece, $pos, $pos2-$pos)); 
     $training->setStarts($start); 

回答

1

添加span标签。这样做:

<b>Start at:</b> 
<span class="wantthis">7/19/2012 10:00:00 AM</span> 

然后用选择它:

$crawler->filter('.wantthis')->text(); 
+1

我爬这个代码的其他网站,我不能添加类。 – AlOpal19 2012-08-07 10:46:15

1

如果你需要测试这种特殊情况下,你不必添加标签,这是封闭的能力,那么你应该可能考虑使用PHPUnit的assertContains()

$text = $crawler->filter('.PropertyBody > b')->text(); 
$this->assertContains('7/19/2012 10:00:00 AM', $text);