按位置筛选twitter文件

我正在尝试查找无数推文的纬度/经度信息。一个路径来鸣叫纬度/经度数据以JSON鸣叫是，按位置筛选twitter文件

{u'location：{u'geo'：{u'coordinates：120.0，-5.0]}}}

我想能够检查每条推文是否存在此位置路径。如果确实如此，那么我希望稍后在函数中使用这些信息。如果没有，我想检查另一个位置路径，最后转到下一条推文。

这里是我目前检查这个路径是否存在的代码，如果有相应的数据。 'data'是我使用data.append（json.loads（line））方法打开的twitter文件列表。

counter = 0 
for line in data: 
    if u'coordinates' in data[counter][u'location'][u'geo']: 

     print counter, "HAS FIELD" 
     counter += 1 
    else: 
     counter += 1 
     print counter, 'no location data'

我得到这个代码的KeyError错误。如果我只是执行下面的代码，但它不够具体，无法让我知道我需要的信息。

counter = 0 
for line in data: 
    if u'location' in data[counter]: 

     print counter, "HAS FIELD" 
     counter += 1 
    else: 
     counter += 1 
     print counter, 'no location data'

有没有人有办法做到这一点。

下面是什么，我总做一些更多的背景，但上述概括了我坚持在那里。

背景：我有机会到12个十亿鸣叫，通过购买GNIP，被划分成多个文件。我试图逐个梳理这些推文，并找出哪些推文具有位置（纬度/经度）数据，然后查看相应的坐标是否落在某个国家。如果该推文确实属于该国家，我会将其添加到一个新的数据库中，该数据库是我的大型数据库的子集。

我已经成功创建了功能测试，如果经/纬度落在我的目标国家的边界框，但我有困难填充经/纬每个鸣叫有2个原因。 1）在每个json文件中存在多个long/lat数据的地方，如果它存在的话。 2）推文被组织在一本复杂的词典中，我难以操纵。

我需要能够遍历每条推文，看看是否存在不同位置路径的特定纬度/长度组合，以便我可以将其拉入并将其送入我的函数，以测试该推文是否源自我的国家出于兴趣。

来源

2015-04-06 CAVHaupt

我得到一个KeyError异常错误与此代码

假设键应该是在双引号，因为他们有'：

counter = 0 
for line in data: 
    if "u'coordinates" in data[counter]["u'location"]["u'geo"]: 

     print counter, "HAS FIELD" 
     counter += 1 
    else: 
     counter += 1 
     print counter, 'no location data'

来源

2015-04-06 19:48:28 phts

我检查了它，但它似乎没有工作。实际的关键是'坐标'。当我读取文件时，它变成了u'coordinates'，因为它是Unicode。在我的第二个例子中，我使用了密钥u'location'，它是数据[counter]中的顶层密钥，它工作正常。我似乎无法调用子图层字典密钥。也许这不是严格考虑的关键？我尝试过使用try-except，它似乎可行，因为它允许我通过KeyError错误，但我不知道它是否像if-else语句那样工作，这似乎是检查多个位置路径的最佳方法。 – CAVHaupt

我检查了数据[counter] [u'location'] [u'geo'] _中的以下路径：_u'coordinates'，并且我得到一个True语句，但只有在该特定推文中存在密钥时才有效。对于没有这个路径的推文，我没有得到一个False语句，但是我得到一个KeyError。 – CAVHaupt

，我找到了解决办法可能不是最有效，但功能。它采用如果嵌套在尝试 - 除了陈述。这使我可以检查不同的位置路径，但通过KeyError s，以便我可以移动到其他推文和路径。以下是我的代码。它会通过多个推文进行检查，并检查是否有3条路径中的任何一条都有可用的纬度/长度组合。它适用于我的addTOdb函数，该函数检查该纬度/长度组合是否在我的目标国家。它还创建了一个名为Lat Long的单独字典，我可以在其中查看包含Lat/Long组合的所有推文以及我将它们拉入的路径。

#use try/except function to see if entry is in json files 
 
#initialize counter that numbers each json entry 
 
counter = 0 
 
#This is a test dict to see what lat long was selected 
 
Lat_Long = {} 
 
for line in data: 
 
    TweetLat = 0 
 
    TweetLong = 0 
 
    #variable that will indicate what path was used for coordinate lat/long 
 
    CoordSource = 0 
 
    #Sets while variable to False. Will change if coords are found. 
 
    GotCoord = False 
 
    while GotCoord == False: 
 
     #check 1st path using geo function 
 
     try: 
 
      
 
      if u'coordinates' in data[counter][u'geo'] and GotCoord == False: 
 
       TweetLat = data[counter][u'geo'][u'coordinates'][0] 
 
       TweetLong = data[counter][u'geo'][u'coordinates'][1] 
 
       #print 'TweetLat',TweetLat 
 
       print counter, "HAS FIELD" 
 
       addTOdb(TweetLat,TweetLong,North,South,East,West) 
 
       CoordSource = 1 
 
       GotCoord = True 
 
     except KeyError: 
 
      pass 
 
     #check 2nd path using gnip info 
 
     try: 
 
      if u'coordinates' in data[counter][u'gnip'][u'profileLocations'][0][u'geo'] and GotCoord == False: 
 
       TweetLat = data[counter][u'gnip'][u'profileLocations'][0][u'geo'][u'coordinates'][1] 
 
       TweetLong = data[counter][u'gnip'][u'profileLocations'][0][u'geo'][u'coordinates'][0] 
 
       print counter, "HAS FIELD" 
 
       addTOdb(TweetLat,TweetLong,North,South,East,West) 
 
       CoordSource = 2 
 
       GotCoord = True 
 
     except KeyError: 
 
      pass 
 
     #check 3rd path using location polygon info 
 
     try:  
 
      if u'coordinates' in data[counter][u'location'][u'geo'] and GotCoord == False: 
 
       TweetLat = data[counter][u'location'][u'geo'][u'coordinates'][0][0][1] 
 
       TweetLong = data[counter][u'location'][u'geo'][u'coordinates'][0][0][0] 
 
       print counter, "HAS FIELD" 
 
       addTOdb(TweetLat,TweetLong,North,South,East,West) 
 
       CoordSource = 3 
 
       GotCoord = True 
 
     except KeyError: 
 
      pass 
 
      
 
     if GotCoord==True: 
 
      Lat_Long[counter] = [CoordSource,TweetLat, TweetLong] 
 
     else: 
 
      print counter, "no field" 
 
      GotCoord = True  
 
    counter += 1

来源

2015-04-07 20:24:14 CAVHaupt

按位置筛选twitter文件

回答

相关问题