2016-12-29 59 views
0

我有一个名为CrimeReport.txt的文件,它具有这种格式的信息。如何通过python 3.0从.txt文件解析某些数据点

{"lang": "en", "favorited": false, "truncated": false, "text": "Active crime scene on I-59/20 near Jeff/Tusc Co line. One dead, one injured; shooting involved. Police search in the area; traffic stopped", "created_at": "Fri Jan 31 05:51:59 +0000 2014", "retweeted": false, "source": "<a href=\"http://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac</a>", "place": {"country_code": "US", "url": "https://api.twitter.com/1.1/geo/id/cf44347a08102884.json", "country": "United States", "place_type": "city", "bounding_box": {"type": "Polygon", "coordinates": [[[-86.926154, 33.267324], [-86.598948, 33.267324], [-86.598948, 33.471006], [-86.926154, 33.471006]]]}, "contained_within": [], "full_name": "Hoover, AL", "attributes": {}, "id": "cf44347a08102884", "name": "Hoover"}, "user": {"id": 15220806, "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "followers_count": 118021, "location": "Alabama", "profile_background_color": "C0DEED", "listed_count": 1705, "utc_offset": -21600, "statuses_count": 76381, "description": "Media meteorologist. WeatherBrains host. Weather geek.", "friends_count": 52014, "profile_link_color": "0084B4", "profile_image_url": "https://pbs.twimg.com/profile_images/1890149584/spannwantsyou_normal.jpg", "geo_enabled": true, "profile_banner_url": "https://pbs.twimg.com/profile_banners/15220806/1381811159", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "screen_name": "spann", "lang": "en", "profile_background_tile": false, "favourites_count": 27, "name": "James Spann", "url": "", "created_at": "Tue Jun 24 16:02:10 +0000 2008", "time_zone": "Central Time (US & Canada)", "protected": false}, "retweet_count": 66, "id": 429129916446031872, "favorite_count": 4} 

这只是CrimeReport中的一行。所有其他行与此给定行的格式相同。我的问题是如何使用Python 3.0遍历每行并从“文本”中解析数据。

+0

请提供一个例子,说明你已经尝试过什么不工作,要求对一般编程问题的完整解决方案过于宽泛。 – Jmills

+0

使用比3.0更新的版本。它有IO问题,并很快被3.1取代。 –

回答

1

这里是一个方式,你可以做到这一点

import operator,json,functools 
the_text = functools.reduce(operator.add,map(operator.itemgetter("text"),map(json.loads,open(fname,"rb")))) 
+1

在Python 3中,这将是'functools.reduce' – tdelaney

2

这看起来像JSON数据,所以只是去通过它逐行。这与Joran的答案类似,只是我保留了一个循环,以便每个记录的“文本”可以独立处理。

import json 

with open("CrimeReport.txt") as f: 
    for line in f: 
     text = json.loads(line)["text"] 
     ... do your work ... 
+0

你比我好:P –