2016-12-05 63 views
0

我希望能够使用日期阈值aws S3 cp,但它没有该功能的开关。Bash脚本添加日期阈值到S3`cp`功能

所以,我想写一个Bash脚本。使用--recursive开关拨打aws s3 ls可以给我一个包含日期和时间的目录列表,我想我可以用它来实现我的目标。下面是一个示例输出:

2016-12-01 18:06:40 0 sftp/ 2016-12-01 20:35:39 1024 sftp/.ssh/.id_rsa.swp 2016-12-01 20:35:39 1679 sftp/.ssh/id_rsa 2016-12-01 20:35:39 405 sftp/.ssh/id_rsa.pub

什么是遍历所有的文件,但只复制这些文件比日期更近的指示的最有效方法是什么?

这里的(不完全)脚本我有这么远:

#!/bin/bash 

while [[ $# -gt 0 ]] 
do 
    key="$1" 

    case $key in 
     -m|--mtime) 
      MTIME="$2" 
      shift 2;; 
     -s|--source) 
      SRCDIR="$2" 
      shift 2;; 
     -d|--dest) 
      DSTDIR="$2" 
      shift 2;; 
     *) 
      #echo "Unknown argument: \"$key\""; exit 1;; 
      break;; 
    esac 
done 

if [ ! -d $DSTDIR ]; then 
    echo "the directory does not exist!"; 
    exit 1; 
fi 

GTDATE="$(date "+%Y%m%d%H%M%S" -d "$MTIME days ago")" 
#echo "Threshold: $GTDATE" 

for f in $(aws s3 ls $SRCDIR --recursive | awk '{ ??? }'); do 
    #aws s3 cp 
done 

回答

1

知道,如果时间戳是本地或UTC是很重要的。
如果当地为美洲/洛杉矶,日期可以解释的时间不正确(注意VS 11的区别03):

$ date -d '20161201T18:06:40' +'%Y%m%dT%H:%M:%S' 
20161201T03:06:40 

$ date -ud '20161201T18:06:40' +'%Y%m%dT%H:%M:%S' 
20161201T11:06:40  

使用-u也避免了与DST和局部变化的问题。

如果以UTC格式记录日期,那么命令date可以读回,并且没有空格,因此awk或类似的命令可以轻松解析。例如:

$ date -ud '2016-12-01 18:06:40' +'%Y-%m-%dT%H:%M:%S' 
2016-12-01T18:06:40 

电脑date和阅读它的用户都更容易。
但你有一个不同的时间戳。

假设文件没有包含换行符的名称。
选项处理后的脚本应该是这样的:

#!/bin/bash 

SayError(){local a=$1; shift; printf '%s\n' "$0: [email protected]" >&2; exit "$a"; } 

[[ ! -d $dstdir ]] && SayError 1 "The directory $dstdir does not exist!" 
[[ ! -d $srcdir ]] && SayError 2 "The source directory $srcdir doesn't exist" 
[[ -z $mtime ]] && SayError 3 "The value of mtime has not been set." 

gtdate="$(date -ud "$mtime days ago" "+%Y-%m-%dT%H:%M:%S")" 
#echo "Threshold: $gtdate" 

readarray -t files < <(aws s3 ls "$srcdir" --recursive) 
limittime=$(date -ud "$gtdate" +"%s") 

for f in "${files[@]}"; do 
    IFS=' ' read day time size name <<<"$f" 
    filetime=$(date -ud "${day}T${time}" +"%s") 
    if [[ $filetime -gt $limittime ]]; then 
     aws s3 cp "$srcdir/$name" "$destdir/" 
    fi 
done 

警告:取消测试的代码,请仔细阅读。

+0

是否有缺失在'filetime = $(date -ud“$ {day} T $ {time}%s)中双引号'' – alphadogg

+0

@alphadogg是的,谢谢。 – sorontar

0

一种方式是将s3 ls结果和截止日期一起排序,然后切断处理S3列表,当你打的截止日期:

cat <(echo "$GTDATE CUTOFF") <(aws s3 ls $SRCDIR --recursive) | 
    sort -dr | 
    awk '{if($3=="CUTOFF"){exit}print}' | 
    xargs -I{} echo aws s3 cp "$SRCDIR/{}" "$DSTDIR/{}" 

(我留在echo在最后一行给你测试,看看会发生什么命令删除echo实际到s3 cp命令。)

(你也可以使用s3 sync而不是cp,以避免重新下载文件已经都准备好最新的。)

1

对于搜索后人,这里的最终剧本的草稿(我们仍在审查;在实践中,你要注释掉所有,但最后两个echo电话):

#!/bin/bash 
# 
# S3ToDirSync 
# 
# This script is a custom SFTP-S3 synchronizer utilizing the AWS CLI. 
# 
# Usage: S3ToDirSync.sh -m 7 -d "/home/user" -b "user-data-files" -s "sftp" 
# 
# Notes: 
# The script is hardcoded to exclude syncing of any folder or file on a path containing "~archive" 
# See http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#available-commands for AWS CLI documentation on commands 
# 
# Updated: 12/05/2016 

while [[ $# -gt 0 ]] 
do 
    key="$1" 

    case $key in 
     -m|--mtime) 
      MTIME="$2" # nb of days (from now) to copy 
      shift 2;; 
     -b|--bucket) 
      BUCKET="$2" # the S3 bucket, no trailing slashes 
      shift 2;; 
     -s|--source) # the S3 prefix/path, slashes at start and end of string will be added if not present 
      SRCDIR="$2" 
      shift 2;; 
     -d|--dest) # the root destination folder 
      DSTDIR="$2" 
      shift 2;; 
     *) 
      #echo "Unknown argument: \"$key\""; exit 1;; 
      break;; 
    esac 
done 

# validations 
if [ ! -d $DSTDIR ]; then 
    echo "The destination directory does not exist."; 
    exit 1; 
fi 
if [[ $DSTDIR != *"/" ]]; then 
    DSTDIR=$DSTDIR\/ 
fi 
echo "DSTDIR: $DSTDIR" 

if [ -z $BUCKET ]; then 
    echo "The bucket value has not been set."; 
    exit 1; 
fi 

if [[ $BUCKET == *"/"* ]]; then 
    echo "No slashes (/) in bucket arg."; 
    exit 1; 
fi 
# add trailing slash 
BUCKET=$BUCKET\/ 
echo "BUCKET: $BUCKET" 

if [ -z $MTIME ]; then 
    echo "The mtime value has not been set."; 
    exit 1; 
fi 

# $SRCDIR may be empty, to copy everything in a bucket, but add a trailing slash if missing 
if [ ! -z $SRCDIR ] && [[ $SRCDIR != *"/" ]]; then 
    SRCDIR=$SRCDIR\/ 
fi 
echo "SRCDIR: $SRCDIR" 

SRCPATH=s3://$BUCKET$SRCDIR 
echo "SRCPATH: $SRCPATH" 

LIMITTIME=$(date -ud "$MTIME days ago" "+%s") 
#echo "Threshold UTC Epoch: $LIMITTIME" 

readarray -t files < <(aws s3 ls "$SRCPATH" --recursive) # TODO: ls will return up to a limit of 1000 rows, which could timeout or not be enough 
for f in "${files[@]}"; do 
    IFS=' ' read day time size name <<<"$f" 
    FILETIME=$(date -ud "${day}T${time}" "+%s") 
    # if S3 file more recent than threshold AND if not in an "~archive" folder 
    if [[ $FILETIME -gt $LIMITTIME ]] && [[ $name != *"~archive"* ]]; then 
     name="${name/$SRCDIR/}" # truncate ls returned name by $SRCDIR, since S3 returns full path 
     echo "$SRCPATH $name" 
     destpath=$DSTDIR$name 
     echo "to $destpath" 
     # if a directory (trailing slash), mkdir in destination if necessary 
     if [[ $name == *"/" ]] && [ ! -d $destpath ]; then 
      echo "mkdir -p $destpath" 
      mkdir -p $destpath 
     # else a file, use aws s3 sync to benefit from rsync-like checks 
     else 
      echo "aws s3 sync $SRCPATH$name $destpath" 
      aws s3 sync $SRCPATH$name $destpath 
      # NOTE: if a file was on S3 then deleted inside MTIME window, do we delete on SFTP server? If so, add --delete 
     fi 
    fi 
done