2013-05-02 126 views
2

我有一组在25k迭代循环内被修改的字符串。它在开始时是空的,但是在每个循环中随机添加或移除0-200个字符串。最后,该集合包含大约80k个字符串。
我想让它恢复原状。该设置应该在每个周期后保存到磁盘,并在简历中加载。
我可以使用什么库?原始数据量约为16M,但变化通常较小。我不希望它在每次迭代中重写整个商店。perl:坚持支持提交支持的字符串集合

由于字符串路径,我想将它们存储在这样的日志文件:

+a 
+b 
commit 
-b 
+d 
commit 

一开始将文件加载到一个哈希,然后压实。如果最后没有提交行,则不考虑最后一个块。

回答

1

Storable package为您的Perl数据结构(SCALAR,ARRAY,HASH或REF对象)带来持久性,即任何可以方便地存储到磁盘并在以后检索的任何东西。

+0

根据描述,它不能只写了改变。我需要更类似DB的东西,用INSERT,DELETE和COMMIT – basin 2013-05-02 09:06:14

+0

如何使用数据库呢?或者你为什么不想使用它? http://search.cpan.org/~rurban/DBD-SQLite2-0.36/lib/DBD/SQLite2.pm是自包含的。 – Matthias 2013-05-02 09:39:14

0

我决定收起重炮和写的东西很简单:

package LoL::IMadeADb; 

sub new { 
    my $self; 
    (my $class, $self->{dbname}) = @_; 
    # open for read, then write. create if not exist 
    #msg "open $self->{dbname}"; 
    open(my $fd, "+>>", $self->{dbname}) or die "cannot open < $self->{dbname}: $!"; 
    seek($fd, 0, 0); 
    $self->{fd} = $fd; 
    #msg "opened"; 
    $self->{paths} = {}; 
    my $href = $self->{paths}; 

    $self->{nlines} = 0; 
    my $lastcommit = 0; 
    my ($c, $rest); 
    while(defined($c = getc($fd)) && substr(($rest = <$fd>), -1) eq "\n") { 
    $self->{nlines}++; 
    chomp($rest); 
    if ($c eq "c") { 
     $lastcommit = tell($fd); 
     #msg "lastcommit: " . $lastcommit; 
    } elsif ($c eq "+") { 
     $href->{$rest} = undef; 
    } elsif ($c eq "-") { 
     delete $href->{$rest}; 
    } 
    #msg "line: '" . $c . $rest . "'"; 
    } 
    if ($lastcommit < tell($fd)) { 
    print STDERR "rolling back incomplete file: " . $self->{dbname} . "\n"; 
    seek($fd, $lastcommit, 0); 
    while(defined($c = getc($fd)) && substr(($rest = <$fd>), -1) eq "\n") { 
     $self->{nlines}--; 
     chomp($rest); 
     if ($c eq "+") { 
     delete $href->{$rest}; 
     } else { 
     $href->{$rest} = undef; 
     } 
    } 
    truncate($fd, $lastcommit) or die "cannot truncate $self->{dbname}: $!"; 
    print STDERR "rolling back incomplete file; done\n"; 
    } 
    #msg "entries = " . (keys(%{ $href })+0) . ", nlines = " . $self->{nlines} . "\n"; 
    bless $self, $class 
} 

sub add { 
    my ($self , $path) = @_; 
    if (!exists $self->{paths}{$path}) { 
    $self->{paths}{$path} = undef; 
    print { $self->{fd} } "+" . $path . "\n"; 
    $self->{nlines}++; 
    $self->{changed} = 1; 
    } 
    undef 
} 

sub remove { 
    my ($self , $path) = @_; 
    if (exists $self->{paths}{$path}) { 
    delete $self->{paths}{$path}; 
    print { $self->{fd} } "-" . $path . "\n"; 
    $self->{nlines}++; 
    $self->{changed} = 1; 
    } 
    undef 
} 

sub save { 
    my ($self) = @_; 
    return undef unless $self->{changed}; 
    my $fd = $self->{fd}; 
    my @keys = keys %{$self->{paths}}; 
    if ($self->{nlines} - @keys > 5000) { 
    #msg "compacting"; 
    close($fd); 
    my $bkpdir = dirname($self->{dbname}); 
    ($fd, my $bkpname) = tempfile(DIR => $bkpdir , SUFFIX => ".tmp") or die "cannot create backup file in: $bkpdir: $!"; 
    $self->{nlines} = 1; 
    for (@keys) { 
     print { $fd } "+" . $_ . "\n" or die "cannot write backup file: $!"; 
     $self->{nlines}++; 
    } 
    print { $fd } "c\n"; 
    close($fd); 
    move($bkpname, $self->{dbname}) 
     or die "cannot rename " . $bkpname . " => " . $self->{dbname} . ": $!"; 
    open($self->{fd}, ">>", $self->{dbname}) or die "cannot open < $self->{dbname}: $!"; 
    } else { 
    print { $fd } "c\n"; 
    $self->{nlines}++; 

    # flush: 
    my $previous_default = select($fd); 
    $| ++; 
    $| --; 
    select($previous_default); 
    } 
    $self->{changed} = 0; 
    #print "entries = " . (@keys+0) . ", nlines = " . $self->{nlines} . "\n"; 
    undef 
} 
1;