2017-05-25 125 views
1

我试图从类似于以下(我将其命名foo_badging.txt)文件解析数据:在PyParsing中,如何忽略可能以空格开头的行?

package: name='com.sec.android.app.camera.shootingmode.dual' versionCode='6' versionName='1.003' platformBuildVersionName='5.0.1-1624448' 
sdkVersion:'17' 
uses-permission: name='android.permission.CAMERA' 
application-icon-640:'res/mipmap-xxhdpi-v4/application_manager_camera_mode_ic_dual_camera.png' 
application: label='Dual camera' icon='res/mipmap-hdpi-v4/application_manager_camera_mode_ic_dual_camera.png' 
feature-group: label='' 
    uses-feature: name='android.hardware.camera' 
    uses-implied-feature: name='android.hardware.camera' reason='requested android.permission.CAMERA permission' 
    uses-feature: name='android.hardware.touchscreen' 
    uses-implied-feature: name='android.hardware.touchscreen' reason='default feature for all apps' 
other-activities 
supports-screens: 'small' 'normal' 'large' 'xlarge' 
supports-any-density: 'true' 
locales: '--_--' 'ca' 'da' 'fa' 'ga' 'ja' 'pa' 'nb' 'be' 'de' 'ne' 'bg' 'mg' 'tg' 'th' 'xh' 'fi' 'hi' 'si' 'vi' 'sk' 'tk' 'uk' 'el' 'nl' 'pl' 'sl' 'tl' 'bn' 'in' 'ko' 'ro' 'sq' 'ar' 'fr' 'hr' 'or' 'sr' 'tr' 'as' 'cs' 'it' 'lt' 'gu' 'hu' 'ru' 'zu' 'lv' 'sv' 'iw' 'fr-CA' 'lo-LA' 'bn-BD' 'et-EE' 'ka-GE' 'ky-KG' 'my-ZG' 'km-KH' 'en-PH' 'zh-HK' 'mk-MK' 'ur-PK' 'hy-AM' 'my-MM' 'zh-CN' 'ta-IN' 'te-IN' 'ml-IN' 'bn-IN' 'kn-IN' 'mr-IN' 'mn-MN' 'pl-SP' 'pt-BR' 'gl-ES' 'es-ES' 'eu-ES' 'is-IS' 'en-US' 'es-US' 'pt-PT' 'zh-TW' 'ms-MY' 'az-AZ' 'kk-KZ' 'uz-UZ' 
densities: '160' '240' '320' '480' '640' 

我想通过分析前几行(packagesdkVersion),然后启动'跳过'几行,直到我到达supports-screens行。以下是我迄今为止:

from pyparsing import Literal, QuotedString, LineEnd, Optional, OneOrMore, LineStart, Regex, White 

with open('foo_badging.txt') as fp: 
    badging = fp.read() 

package_name = "name=" + QuotedString(quoteChar="'")("name") 
versionCode = "versionCode=" + QuotedString(quoteChar="'")("versionCode") 
versionName = "versionName=" + QuotedString(quoteChar="'")("versionName") 
platformBuildVersionName = "platformBuildVersionName=" + QuotedString(quoteChar="'")("platformBuildVersionName") 
sdkVersion = "sdkVersion:" + QuotedString(quoteChar="'")("sdkVersion") 
targetSdkVersion = "targetSdkVersion:" + QuotedString(quoteChar="'")("targetSdkVersion") 

not_supports_screens_line = LineStart() + Regex(r"(?!supports-screens:).*")  # Negative lookahead assertion for a line starting with "supports-screens:" 

supports_screens = "supports-screens:" + QuotedString(quoteChar="'")("supports_screens") 

expression = Literal("package:") + package_name + versionCode + versionName + platformBuildVersionName + LineEnd() \ 
       + Optional(sdkVersion + LineEnd()) \ 
       + Optional(targetSdkVersion + LineEnd()) \ 
       + OneOrMore(not_supports_screens_line) \ 
       + supports_screens + LineEnd() 

tokens = expression.parseString(badging) 

的问题是,我得到了ParseException在缩进use-feature行:

Traceback (most recent call last): 
    File "/home/kurt/Documents/Scratch/apk_checker/apk_check.py", line 82, in <module> 
    tokens = expression.parseString(badging) 
    File "/usr/local/lib/python2.7/dist-packages/pyparsing.py", line 1632, in parseString 
    raise exc 
pyparsing.ParseException: Expected "supports-screens:" (at char 435), (line:7, col:3) 

显然,这缩进线不算作not_supports_screens_line,大概是因为不像其他,它从两个空格开始。我试着修改Regex

not_supports_screens_line = LineStart() + Regex(r"\s*(?!supports-screens:).*") 

\s*,以及

not_supports_screens_line = LineStart() + Optional(White()) + Regex(r"(?!supports-screens:).*") 

但在这两种情况下,我仍然得到同样的错误消息。我怎样才能让not_supports_screens_line也匹配这些缩进行?

+0

我认为正确的正则表达式是'\\ s +' –

+1

尝试重新思考你的语法,以便只匹配你想要的数据块,如'pieceA | pieceB | pieceC',然后使用'searchString'而不是'parseString'。或者使用'SkipTo'跳过中间位:'pieceA + SkipTo(pieceB)+ pieceB + SkipTo(pieceC)+ pieceC'。 – PaulMcG

回答

0

Paul McGuire的评论,我以前SkipTo避免必须制定我没有兴趣在线路复杂负前瞻表达下面是最终代码:

def convert_to_int(tokens): 
    return int(tokens[0]) 

with open('foo_badging.txt') as fp: 
    badging = fp.read() 

package_name = "name=" + QuotedString(quoteChar="'")("name") 
versionCode = "versionCode=" + QuotedString(quoteChar="'")("versionCode").setParseAction(convert_to_int) 
versionName = "versionName=" + QuotedString(quoteChar="'")("versionName") 
platformBuildVersionName = "platformBuildVersionName=" + QuotedString(quoteChar="'")("platformBuildVersionName") 
sdkVersion = "sdkVersion:" + QuotedString(quoteChar="'")("sdkVersion").setParseAction(convert_to_int) 
targetSdkVersion = "targetSdkVersion:" + QuotedString(quoteChar="'")("targetSdkVersion").setParseAction(convert_to_int) 

supports_screens = LineStart() + "supports-screens:" + QuotedString(quoteChar="'")("supports_screens") 

expression = Literal("package:") + package_name + versionCode + versionName + platformBuildVersionName + LineEnd() \ 
       + Optional(sdkVersion + LineEnd()) \ 
       + Optional(targetSdkVersion + LineEnd()) \ 
       + SkipTo("supports-screens:") + supports_screens 

tokens = expression.parseString(badging) 

print tokens.asDict() 

它打印

{'sdkVersion': 17, 'name': 'com.sec.android.app.camera.shootingmode.dual', 'platformBuildVersionName': '5.0.1-1624448', 'supports_screens': 'small', 'versionName': '1.003', 'versionCode': 6} 

根据需要包括supports_screens字段。