2012-03-27 44 views
2

在当前Linux发行版中找到的XMLStarlet版本具有每个xmlstarlet ed调用的128个操作的限制,而全部版本受操作系统的最大命令行长度限制。这怎么解决?处理XMLStarlet中的长编辑列表

+0

这是限制你一个问题,在实践中? – npostavs 2012-03-28 14:17:38

+0

@npostavs是的。查看我对http://stackoverflow.com/questions/9880808/shell-script-to-parse-csv-to-an-xml-query/9882015的答案,以查看需要处理多于一个少数输入线。我也在商业,生产代码中遇到了这个问题(尽管后面的例子被改写为在XQuery中执行相关处理而不是bash + xmlstarlet)。 – 2012-03-28 15:42:03

回答

3

下休息长xmlstarlet编辑列表为较短操作的流水线:

xmlstarlet_max_commands=100 # max per instance; see http://sourceforge.net/tracker/?func=detail&aid=3488240&group_id=66612&atid=515106 
shopt -s extglob # enable +([0-9]) as an equivalent to the regex ^[[:digit:]]+ 

xmlstarlet_ed() { 
    declare -a global_parameters 
    declare -a parameters 
    declare -i num_commands 
    declare -i cmd_len 

    global_parameters=() 
    parameters=() 
    num_commands=0 

    global_parameters_remaining=$1; shift 

    while ((global_parameters_remaining)); do 
    global_parameters+=("$1"); shift 
    ((global_parameters_remaining--)) 
    done 

    while (("$#")) ; do 
    cmd_len=$1; shift 
    if ! [[ $cmd_len = +([0-9]) ]] ; then 
     echo "ERROR: xmlstarlet_ed commands must be prefixed by run length" 
     return 1 
    fi 

    if ((num_commands < xmlstarlet_max_commands)) ; then 
     parameters+=("${@:1:$cmd_len}") 
     num_commands+=1 
     shift $cmd_len 
    else 
     xmlstarlet ed "${#global_parameters[@]}" "${global_parameters[@]}" "${parameters[@]}" \ 
     | xmlstarlet_ed "${#global_parameters[@]}" "${global_parameters[@]}" "$cmd_len" "[email protected]" 
     return 0 
    fi 
    done 

    if ((${#parameters[@]} > 0)) ; then 
    xmlstarlet ed "${global_parameters[@]}" "${parameters[@]}" 
    else 
    cat 
    fi 
} 

可以调用像这样:

# first list passed is global parameters; first the count, then the values 
# pass only a 0 if no global parameters are desired 
global_parameters=(2 -N "xhtml=http://www.w3.org/1999/xhtml") 

# build up the parameter list as length/command pairs; the lengths are used 
# to determine the potential split points between subprocesses 
parameters=() 
while read; do 
    parameters+=(8 -s /xhtml:html/xhtml:body -t elem -n line -v "$REPLY") 
done 

# ...and actually invoke: 
xmlstarlet_ed "${global_parameters[@]}" "${parameters[@]}" \ 
<<<"<html xmlns='http://www.w3.org/1999/xhtml'><body/></html>" 
+0

+1在一读时没有注意到'xmlstarlet_ed'和'xmlstarlet ed'之间的区别。我感觉这是一个简短的通知,说'xmlstarlet_ed'是一个递归函数,可以增强可读性。 – 2014-08-05 14:13:55