2015-07-10 97 views
0

捕获“名称”的内容时遇到问题:他经常出现在“pluralName”之前的其他页面。有什么更好的方法呢? (就性能而言最好的方式)。感谢您的帮助!如何使用正则表达式提取信息页面

注:我使用python

有我需要的信息页面的块:

{"count":0,"items":[]},"shortUrl":"http:\/\/4sq.com\/11nP13T","likes":{"count":22,"groups":[{"type":"others","count":22,"items":[]}],"summary":"22 Likes"},"ratingColor":"FF9600","id":"5172311be4b0ecc0a12a9953","canonicalPath":"\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","canonicalUrl":"https:\/\/foursquare.com\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","rating":5.3,"categories":[**{"pluralName":"Chinese Restaurants","name":"Chinese Restaurant",**"icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/chinese","suffix":".png"},"id":"4bf58dd8d48988d145941735","shortName":"Chinese","primary":true},{"pluralName":"Asian Restaurants","name":"Asian Restaurant","icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/asian","suffix":".png"},"id":"4bf58dd8d48988d142941735","shortName":"Asian"}],"createdAt":1366438171,"tips":{"count":25,"groups":[{"count":25,"items":[{"logView":true,"text":"Portion is quite small and expensive. Service attitude is so so. The BKT taste is not my preference.One of the up car restaurants in SS2 which I'll never go back again. 👎","likes":{"count":1,"groups":[{"type":"others","count":1,"items":[{"photo":{"prefix":"https:\/\/irs0.4sqi.net\/img\/user\/","suffix":"\/43964080-5LYADRF2EEP2RWPL.jpg"},"lastName":".w","firstName":"Jackie","id":"43964080","canonicalPath":"\/user\/43964080","canonicalUrl":"https:\/\/foursquare.com\/user\/43964080","gender":"female"}]}],"summary":"1 like"},"id":"541c2b73498eb0cfe1f76b9e","canonicalPath":"\/item\/541c2b73498eb0cfe1f76b9e","canonicalUrl":"https:\/\/foursquare.com\/item\/541c2b73498eb0cfe1f76b9e","createdAt":1.411132275E9,"todo":{"count":0},"user":{"photo":{"prefix":"https:\/\/irs1.4sqi.net\/img\/user\/","suffix":"\/5765949-NW4BAJWFBCVLRR1M.jpg"} 
+0

你有什么需要精确匹配? –

+0

你能提供我想匹配在这个例子中的“亚洲餐厅”预期的输出 – The6thSense

+0

,但是,我将运行到有标记“名”不同值的其他网页: – user2905427

回答

1
(?:"pluralName":"[^"]*","name":"([^"]*))|(?:"name":"([^"]*)","pluralName") 

re.findall。看到演示试试这个。

https://regex101.com/r/hR7tH4/4

print re.findall(r'(?:"pluralName":"[^"]*","name":"([^"]*))|(?:"name":"([^"]*)","pluralName")',test_str) 
+0

谢谢你,有比较的结果是作为输出,但使用这些结果更容易。 – user2905427

1

不要使用正则表达式的。

而是使用JSON解析器并访问生成的对象。这是更强大。

import json # part of python 
o = json.loads(str) 
+0

很好的答案!如果你展示一个如何使用'o'的例子,为什么这个答案更好呢?甚至可能在发布的问题的上下文中。 – tsroten

+0

他分享的JSON片段是FUBAR,无法修复。 –