2009-04-13 63 views

回答

2

有点相关的问题:How can I get the page orientation of a PDF page?How do I get character offset information from a pdf document?

与后一个问题的解决方案开始,我想出了这个食谱:

use CAM::PDF; 
my $pdf = CAM::PDF->new('my.pdf') or die $CAM::PDF::errstr; 
for my $pagenum (1 .. $pdf->numPages) { 
    my $pagetree = $pdf->getPageContentTree($pagenum) or next; 
    my @text = $pagetree->traverse('MyRenderer')->getTextBlocks; 
    for my $textblock (@text) { 
     print "text '$textblock->{str}' at ", 
     "($textblock->{left},$textblock->{bottom}), angle $textblock->{angle}\n"; 
    } 
} 

package MyRenderer; 
use base 'CAM::PDF::GS'; 

sub new { 
    my ($pkg, @args) = @_; 
    my $self = $pkg->SUPER::new(@args); 
    $self->{refs}->{text} = []; 
    return $self; 
} 
sub getTextBlocks { 
    my ($self) = @_; 
    return @{$self->{refs}->{text}}; 
} 
sub renderText { 
    my ($self, $string, $width) = @_; 
    my ($x, $y) = $self->textToDevice(0,0); 
    my ($x1, $y1) = $self->textToDevice(1,0); 
    push @{$self->{refs}->{text}}, { 
     str => $string, 
     left => $x, 
     bottom => $y, 
     angle => atan2($y1-$y, $x1-$x), 
    }; 
    return; 
} 

其产生这一结果的565页PDFReference15_v5.pdf:

text 'ab' at (371.324,583.7249), angle -1.5707963267949 
text 'c' at (371.324,576.63365), angle -1.5707963267949 

请注意,角度是弧度。除以Pi并乘以180将其转换为度数。因此,-1.5707963267949是270度,与第565页一致。

请注意,打印的角度是相对于页面内容的角度。如果页面本身进一步旋转(按照上面的页面方向问题),那么您可能需要混合旋转计算。