2017-02-15 51 views
0

我在我的产品网站中有html页面,我想要解析文档并从html页面获取产品版本。解析并greiler中的html

html页面应该是这样的:

<html> 
....... 
....... 
<body> 
....... 
....... 
<div id='version_info'> 
    <div class="product-version"> 
     <div class="product-title">Name of the product 1:</div><div class="product-value">ver_123</div> 
    </div> 
    <div class="product-version"> 
     <div class="product-title">Name of the product 2:</div><div class="product-value">ver_456</div> 
    </div> 
    <div class="product-version"> 
     <div class="product-title">Name of the product 3:</div><div class="product-value">ver_845</div> 
    </div> 
    <div class="product-version"> 
     <div class="product-title">Name of the product 4:</div><div class="product-value">ver_146</div> 
    </div> 
</div> 
....... 
....... 
</body> 
....... 
....... 
</html> 

我怎么可以grep文档和表格的字符串这样的事? productname1 = ver_123,productname2 = ver_456,productname3 = ver_845等

+0

你需要回答的HTML这种特定的形式?或者它可以不同? –

+0

如果我为这个HTML获得答案,那将会很好。但是如果你有类似的例子,那也会有很大的帮助。 –

+0

grepping xml/html,现在你有两个问题。 – tedder42

回答

1

我已经在这个特殊的HTML文件的工作,并在结果我未满可变result

注获得所需的变量的字典:

1.请改变手册中html文件的路径。

2.这个特定的手册适用于这个HTML例子。为了进一步的要求和改进提供HTML。

--- 
- hosts: localhost 
    name: "Getting varibles from HTML" 
    vars: 
    result: {} 
    tasks: 
    - name: "Getting content of the file" 
    command: cat /path/to/html/file 
    register: search 
    - name: "Creating dictionary while Looping over file" 
    ignore_errors: true 
    vars: 
    key: "{{item | replace('<div class=\"product-title\">','') | replace('</div>','') | regex_replace('<div.*','') | regex_replace('^\\s*','')}}" 
    value: "{{item | replace('<div class=\"product-title\">','') | replace('</div>','') | regex_replace('^[\\w\\s\\:]*','') | replace('<div class=\"product-value\">','') | regex_replace('\\s*$','')}}" 
    set_fact: 
    result: "{{ result | combine({ key: value }) }}" 
    when: "'product-title' in item" 
    with_items: "{{search.stdout_lines}}" 

    - name: "Getting register" 
    debug: 
    msg: "{{result}}" 
... 

输出

ok: [localhost] => { 
    "msg": { 
     "Name of the product 1:": "ver_123", 
     "Name of the product 2:": "ver_456", 
     "Name of the product 3:": "ver_845", 
     "Name of the product 4:": "ver_146" 
    } 
} 
+1

谢谢。我会检查你的代码今天,让你知道:) –

+0

欢迎..... :) –

+1

@SRNathan我看到它已经差不多7个月,因为这篇文章,但它看起来像Sahil解决了你的问题;你应该接受它。如果你自己解决了问题,并且他的答案有所帮助,请考虑接受它,并在OP中提供解决方案作为编辑。 –