2017-08-15 74 views
0

我想从HTML存储格式文本文件中提取所有键和值对。我需要获取用户密钥的所有值并保存它们。 存储格式看起来像这样的例子:从HTML存储获取键/值对

ame="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="2c9289304dbbc5b3014dbd91f1070003" /></ac:parameter></ac:structured-macro></td><td><ac:link><ri:user ri:userkey="2c9289304dbbc5b3014dbd91f1070003" /></ac:link></td><td><p>Framework Team</p></td><td>+ 4 New York 04</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="a5a4315a-b070-4af5-bf4b-6785b6ae50e4" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="2c9289304dd05f3e014dd3ed18470027" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="2c9289304dd05f3e014dd3ed18470027" /></ac:link></td><td colspan="1">Framework Team</td><td colspan="1">+ 4 New York 02</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="a700d77f-0fb0-4288-9a0b-198a35e75f05" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="2c9289304dd5858a014dd74be3a80008" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="2c9289304dd5858a014dd74be3a80008" /></ac:link></td><td colspan="1">Framework Team</td><td colspan="1">+ 4 New York 02</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="291fc9f1-db1c-48af-8897-3ac294b6e608" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="2c9289304dd05f3e014dd3e24b1a0021" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="2c9289304dd05f3e014dd3e24b1a0021" /></ac:link></td><td colspan="1">Framework Team</td><td colspan="1">+ 4 New York 02</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="0c453e0b-f441-408f-8784-7545192d8d0a" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="2c9289304dd5858a014dd751a512000b" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="2c9289304dd5858a014dd751a512000b" /><ac:plain-text-link-body><![CDATA[Stephan]]></ac:plain-text-link-body></ac:link><ac:link><ri:user ri:userkey="2c9289304dd5858a014dd751a512000b" /><ac:plain-text-link-body><![CDATA[ Ngoie Kapenda]]></ac:plain-text-link-body></ac:link></td><td colspan="1">Framework Team</td><td colspan="1">+ 4 New York 02</td></tr><tr><td><ac:structured-macro ac:macro-id="518fb9ae-e1e5-4d71-9147-e5e5c6a4ffe2" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d52ae76760152b1dfc49b0019" /></ac:parameter></ac:structured-macro></td><td><ac:link><ri:user ri:userkey="8a45f48d52ae76760152b1dfc49b0019" /></ac:link></td><td>Framework Team</td><td>+ 4 New York 04</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="bf34bc8c-3803-44b7-a4db-0d0181207103" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d5209ab0601520d2585020014" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="8a45f48d5209ab0601520d2585020014" /></ac:link></td><td colspan="1">Framework Team</td><td colspan="1">+ 4 New York 04</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="cf90f25b-b5cc-4fd0-8ecb-0ee6d59cc46b" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d5b7394ed015b8594ebf90017" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="8a45f48d5b7394ed015b8594ebf90017" /></ac:link></td><td colspan="1"><span>Framework Team</span></td><td colspan="1"><span>+ 4 New York 02</span></td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="b763a704-a016-47db-b2cd-a49c465e0772" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d4ef532a5014ef855cd06000a" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="8a45f48d4ef532a5014ef855cd06000a" /></ac:link></td><td colspan="1">Change Manager</td><td colspan="1">+ 4 New York 04</td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="b04b7168-01d8-4247-ac54-d61f12bc3d7d" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d59d7cf310159db1df7d70009" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="8a45f48d59d7cf310159db1df7d70009" /></ac:link></td><td colspan="1"><span>Data Governance&nbsp;</span></td><td colspan="1"><span>+ 4 New York 08</span></td></tr><tr><td colspan="1"><ac:structured-macro ac:macro-id="2c34834f-b6e6-4720-b4b0-960c12681271" ac:name="profile-picture" ac:schema-version="1"><ac:parameter ac:name="User"><ri:user ri:userkey="8a45f48d5bbbae0d015bc807c0850009" /></ac:parameter></ac:structured-macro></td><td colspan="1"><ac:link><ri:user ri:userkey="8a45f48d5bbbae0d015bc807c0850009" /></ac:link></td><td colspan="1"><span>Data Governance&nbsp;</span></td><td colspan="1"><span>+ 4 New York 08</span></td></tr></tbody></table></ac:layout-cell></ac:layout-section></ac:layout> 

我用下面的正则表达式来获取值:

sed -n 's/.*userkey="\(.*\)"/\1/p' | cut -f1 -d ' ' | tr -d '"' 

但我只能得到一个值返回。我想获得所有的价值。那么,这个正则表达式有什么问题?

预先感谢您!

回答

1

由于您没有向我们展示预期的输出,所以我只打印这里的值,如果您有任何疑问,请告诉我。

awk -v RS='[ :]' '/userkey/' Input_file 

另外我可以看到你的Input_file进入一行,所以这段代码已经按照显示的Input_file写入。

+1

谢谢。这是应该的,它的工作原理! –

+0

很高兴它帮助你,看到这个https://stackoverflow.com/help/someone-answers – RavinderSingh13