Perl - 从Excel写入CSV时的空行

我想用Perl将excel文件转换为csv文件。为了方便，我喜欢使用模块File :: Slurp进行读/写操作。我需要它在一个子功能。Perl - 从Excel写入CSV时的空行

虽然打印出到屏幕上，该程序生成期望的输出，生成CSV-文件不幸只是包含一排以分号，字段是空的。

下面是代码：

#!/usr/bin/perl 

use File::Copy; 
use v5.14; 
use Cwd; 
use File::Slurp; 
use Spreadsheet::ParseExcel; 


sub xls2csv { 
    my $currentPath = getcwd(); 
    my @files  = <$currentPath/stage0/*.xls>; 

    for my $sourcename (@files) { 
     print "Now working on $sourcename\n"; 
     my $outFile = $sourcename; 
     $outFile =~ s/xls/csv/g; 
     print "Output CSV-File: ".$outFile."\n"; 
     my $source_excel = new Spreadsheet::ParseExcel; 
     my $source_book = $source_excel->Parse($sourcename) 
      or die "Could not open source Excel file $sourcename: $!"; 

     foreach my $source_sheet_number (0 .. $source_book->{SheetCount} - 1) 
     { 
      my $source_sheet = $source_book->{Worksheet}[$source_sheet_number]; 

      next unless defined $source_sheet->{MaxRow}; 
      next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow}; 
      next unless defined $source_sheet->{MaxCol}; 
      next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol}; 

      foreach my $row_index (
       $source_sheet->{MinRow} .. $source_sheet->{MaxRow}) 
      { 
       foreach my $col_index (
        $source_sheet->{MinCol} .. $source_sheet->{MaxCol}) 
       { 
        my $source_cell = 
         $source_sheet->{Cells}[$row_index][$col_index]; 
        if ($source_cell) { 

         print $source_cell->Value, ";"; # correct output! 

         write_file($outFile, { binmode => ':utf8' }, $source_cell->Value, ";"); # only one row of semicolons with empty fields! 
        } 
       } 
       print "\n"; 
      } 
     } 

    } 
} 

xls2csv();

我知道它是与传入WRITE_FILE功能参数，但不能设法解决它。

有没有人有想法？

非常感谢您提前。除非append => 1给出选项

来源

2013-08-31 royskatt

加上'use strict;使用警告;'并报告你得到的错误/警告。我认为'使用v5.14'激活严格，但不警告..但使用它们都是确定的。 – TLP

您应该知道，您可能会冒险用'$ outFile =〜s/xls/csv/g'这行来覆盖原始文件。在Windows中，'<*.xls>'会匹配类似'foo.XLS'的东西，但是你的正则表达式区分大小写，并且不会执行替换，所以你的输入和输出文件名是相同的。使用'/ i'来忽略大小写。 – TLP

感谢提示区分大小写。我加了使用严格;使用警告; 我得到的唯一警告是“宽字符打印在etl.pl行45.”。但那是因为我的文件中有“ö”等字符。 – royskatt

write_file将覆盖该文件。所以这个：

write_file($outFile, { binmode => ':utf8' }, $source_cell->Value, ";");

会为每个新的单元值写一个新的文件。然而，它不符合“只有一行空分栏的分号”的描述，因为它只应该是一个分号和一个值。

我对这份感情值得怀疑从您："For convenience I like to use the module File::Slurp"。虽然print声明按原样工作，但使用File::Slurp则不会。那么这很方便吗？

如果您还想使用write_file，您应该做的是收集所有要打印的行，然后在循环结束时立即将它们全部打印出来。例如： -

$line .= $source_cell->Value . ";"; # use concatenation to build the line 
... 
push @out, "$line\n";     # store in array 
... 
write_file(...., \@out);    # print the array

另一个简单的办法是使用join，或使用Text::CSV模块。

来源

2013-08-31 12:53:59 TLP

正如我在下面提到的那样，第一个问题解决了，虽然不是很优雅。不幸的是另一个弹出：当我有Excel文件中的空白列时，CSV文件中没有相应的字段生成（缺少分号）。 :-( – royskatt

上面提到的另一个问题是：为什么在write_file中使用引用：\ @out而不仅仅是@out？有什么不同？ – royskatt

@royskatt当我浏览文档时，发现它是（可能的）一个小的优化，因为你不必复制数据，当你传递一个数组时，数组会被展开并且元素被复制到'@ _'。至于空白字段......你必须确保即使单元格为空，也会打印一个值，例如，检查'if（$ source_cell）'可能是不正确的，因为它可能意味着空白单元格被忽略，您可以使用defined或operator，例如'my $ value = $ source_cell-> value //“”' – TLP

那么，在这种特殊情况下，文件::嘟嘟地喝确实复杂化这对我来说。我只是想避免重复自己，这是我在下面的笨拙工作的解决方案所做的：

#!/usr/bin/perl 

use warnings; 
use strict; 
use File::Copy; 
use v5.14; 
use Cwd; 
use File::Basename; 
use File::Slurp; 
use Tie::File; 
use Spreadsheet::ParseExcel; 
use open qw/:std :utf8/; 

# ... other functions 

sub xls2csv { 
    my $currentPath = getcwd(); 
    my @files  = <$currentPath/stage0/*.xls>; 
    my $fh; 

    for my $sourcename (@files) { 
     say "Now working on $sourcename"; 
     my $outFile = $sourcename; 
     $outFile =~ s/xls/csv/gi; 
     if (-e $outFile) { 
      unlink($outFile) or die "Error: $!"; 
      print "Old $outFile deleted."; 
     } 
     my $source_excel = new Spreadsheet::ParseExcel; 
     my $source_book = $source_excel->Parse($sourcename) 
      or die "Could not open source Excel file $sourcename: $!"; 

     foreach my $source_sheet_number (0 .. $source_book->{SheetCount} - 1) 
     { 
      my $source_sheet = $source_book->{Worksheet}[$source_sheet_number]; 

      next unless defined $source_sheet->{MaxRow}; 
      next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow}; 
      next unless defined $source_sheet->{MaxCol}; 
      next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol}; 

      foreach my $row_index (
       $source_sheet->{MinRow} .. $source_sheet->{MaxRow}) 
      { 
       foreach my $col_index (
        $source_sheet->{MinCol} .. $source_sheet->{MaxCol}) 
       { 
        my $source_cell = 
         $source_sheet->{Cells}[$row_index][$col_index]; 
        if ($source_cell) { 
         print $source_cell->Value, ";"; 
         open($fh, '>>', $outFile) or die "Error: $!"; 
         print $fh $source_cell->Value, ";"; 
         close $fh; 
        } 
       } 
       print "\n"; 
       open($fh, '>>', $outFile) or die "Error: $!"; 
       print $fh "\n"; 
       close $fh; 
      } 
     } 

    } 
} 

xls2csv();

实际上，我不满意的话，因为我开等常闭文件（我有很多多行文件）。在性能方面这不是很聪明。

目前我还是不知道如何使用拆分或文字：CSV在这种情况下，为了把一切到一个数组，打开，写入和关闭每个文件只一次。

谢谢你的回答TLP。

来源

2013-08-31 23:13:45 royskatt

对于每个值，您不必打开追加（'>>'）一次。每个文件只需使用一次'>'模式。 – TLP

Perl - 从Excel写入CSV时的空行

回答

相关问题