2013-02-08 54 views
0

我在查询Twitter API并接收utf-8编码的答案。现在我想用format()函数将这些答案保存在一个字符串中。这是我到目前为止(我已经尝试了很多替代品)。无法在字符串中保存utf-8编码的东西

for user in userInfos: 
    tName = user["name"] if user["name"] is not None else "" 
    tLocation = user["location"] if user["location"] is not None else "" 
    tProfileImageUrl = user["profile_image_url"] if user["profile_image_url"] is not None else "" 
    tCreatedAt = user["created_at"] 
    tFavouritesCount = user["favourites_count"] 
    tUrl = user["url"] if user["url"] is not None else "" 
    tId = user["id"] 
    tProtected = user["protected"] 
    tFollowerCount = user["followers_count"] 
    tLanguage = user["lang"] 
    tVerified = user["verified"] 
    tGeoEnabled = user["geo_enabled"] 
    tTimeZone = user["time_zone"] if user["time_zone"] is not None else "" 
    tFriendsCount = user["friends_count"] 
    tStatusesCount = user["statuses_count"] 
    tScreenName = user["screen_name"] 

    # Custom characteristics 
    age = utl.get_age_in_years(birthdayDict[str(tId)]) 

    # Follower-friend-ratio 
    if tFriendsCount > 0: 
     foRatio = float(tFollowerCount)/float(tFriendsCount) 
    else: 
     foRatio = "" 

    # Age of account in weeks 
    numWeeks = utl.get_age_in_weeks(tCreatedAt) 

    # Tweets per time 
    tweetsPerWeek = float(tStatusesCount)/numWeeks 
    tweetsPerDay = tweetsPerWeek/7.0 

    in_users.remove(str(tId)) 

    outputList = [str(tName), 
        str(tScreenName), 
        str(tProfileImageUrl), 
        str(tLocation), 
        str(tCreatedAt), 
        str(tUrl), 
        str(age), 
        str(tStatusesCount), 
        str(tFollowerCount), 
        str(tFriendsCount), 
        str(tFavouritesCount), 
        str(foRatio), 
        str(tLanguage), 
        str(tVerified), 
        str(tGeoEnabled), 
        str(tTimeZone), 
        str(tProtected), 
        str(numWeeks), 
        str(tweetsPerWeek), 
        str(tweetsPerDay)] 

    pprint.pprint(outputList) 
    fOut.write("{}{}{}{}{}{}{}\n".format(twitterUsers[str(tId)], outputDelimiter, outputDelimiter.join(outputList), outputDelimiter, utl.get_date(), outputDelimiter, utl.get_time())) 

STR(TNAME),STR(tLocation)等时TNAME/tLocation包含的东西给我的错误,如\ XE4

ERROR:__main__:'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128) 
Traceback (most recent call last): 
    File "../code/userinfo_extraction_old.py", line 167, in <module> 
    outputList = [str(tName), 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128) 

我试图理解它是如何工作的,但我无法弄清楚这里有什么问题。我也尝试使用unicode()而不是str()...没有机会。

+0

...而你正在运行Python 2.something? – 2013-02-08 11:55:50

+0

是啊,Python 2.7版,忘了提的是,对不起。 – wnstnsmth 2013-02-08 11:57:59

+0

try str = str.decode('utf-8') – 2013-02-08 11:59:24

回答

1

要将unicode数据转换为str,您需要指定编码。使用tName.encode('utf8')

您可能需要Python和Unicode的读了起来:

+0

非常感谢。是的,我以前可能会阅读其中的一两个文档,但由于该主题非常无聊,因此我总会在一周后忘记一半的内容......但是,无论如何感谢链接。 – wnstnsmth 2013-02-08 12:09:34