2008-12-16 54 views
1

我正在使用下面的程序对电子邮件进行排序并最终打印出来。一些消息可能包含附件或HTML代码,这对打印不利。有没有一种简单的方法来剥离附件和去除HTML而不是HTML中从消息格式化的文本?如何从电子邮件中去除HTML和附件?

#!/usr/bin/perl 
use warnings; 
use strict; 
use Mail::Box::Manager; 

open (MYFILE, '>>data.txt'); 
binmode(MYFILE, ':encoding(UTF-8)'); 


my $file = shift || $ENV{MAIL}; 
my $mgr = Mail::Box::Manager->new(
    access   => 'r', 
); 

my $folder = $mgr->open(folder => $file) 
or die "$file: Unable to open: $!\n"; 

for my $msg (sort { $a->timestamp <=> $b->timestamp } $folder->messages) 
{ 
    my $to   = join(', ', map { $_->format } $msg->to); 
    my $from  = join(', ', map { $_->format } $msg->from); 
    my $date  = localtime($msg->timestamp); 
    my $subject  = $msg->subject; 
    my $body  = $msg->decoded->string; 

    # Strip all quoted text 
    $body =~ s/^>.*$//msg; 

    print MYFILE <<""; 
From: $from 
To: $to 
Date: $date 
Subject: $subject 
\n 
$body 

} 

回答

3

Mail::Message::isMultipart会告诉你一个给定的消息是否有任何附件。 Mail::Message::parts会给你一个邮件部分的列表。

这样:

if ($msg->isMultipart) { 
    foreach my $part ($msg->parts) { 
     if ($part->contentType eq 'text/html') { 
      # deal with html here. 
     } 
     elsif ($part->contentType eq 'text/plain') { 
      # deal with text here. 
     } 
     else { 
      # well? 
     } 
    } 
} 
1

剥离HTML方面在FAQ#9(或从perldoc -q html的第一项)中解释。简而言之,相关模块是HTML :: Parser和HTML :: FormatText。

至于附件,以附件的电子邮件作为MIME发送。从this example,你可以看到格式很简单,你可以很容易地想出解决方案,或者检查MIME modules at CPAN

0

它看起来像有人已经solved this on the linuxquestions forum

从论坛:

  # This is part of Mail::POP3Client to get the headers and body of the POP3 mail in question 
      $body = $connection->HeadAndBody($i); 
      # Parse the message with MIME::Parser, declare the body as an entitty 
      $msg = $parser->parse_data($body); 
      # Find out if this is a multipart MIME message or just a plaintext 
      $num_parts=$msg->parts; 
      # So its its got 0 parts i.e. is a plaintext 
      if ($num_parts eq 0) { 
      # Get the message by POP3Client 
      $message = $connection->Body($i); 
      # Use this series of regular expressions to verify that its ok for MySQL 
      $message =~ s/</&lt;/g; 
      $message =~ s/>/&gt;/g; 
      $message =~ s/'//g; 
            } 
      else { 
        # If it is MIME the parse the first part (the plaintext) into a string 
       $message = $msg->parts(0)->bodyhandle->as_string; 
        } 
+0

你能修的链接linuxquestions.org? – innaM 2008-12-16 12:49:45