2015-01-26 71 views
2

我在文本文件中有大量的字符串,我想按如下方式在每个字符串周围放置倒排引号。把倒排引号围绕使用索引的字符串,python

文本文件包含这么多的线路,如:

{created_at:2014年7月7日,文章:土耳其政府已经 绘制的路线图取缔库尔德工人党武装的回报,谁 为了在土耳其东南部开拓出一个 独立的国家采取了对土耳其国家武器。}

,我想插入倒报价周围的日期和文章内容是这样的...

{created_at:“2014年7月07,”文章:“土耳其政府已经 绘制的取缔库尔德工人党的武装分子返回的路线图谁 为了开拓拿起武器反对土耳其政府使用蟒蛇指数法在土耳其东南部“}一个 独立的状态..

但我得到的结果作为{created_at : "July 07", 2014, article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey} ..因此它被放置引号错了位置。

这里是我的代码:此粗叶文件的读/写你

f = open("textfile.txt", "r") 
for item in f: 
    first_comma_pos = item.find(",") 
    print first_comma_pos 
    first_colon_pos = item.find(" : ") 
    print first_colon_pos 
    second_comma_pos = item.find(",", first_comma_pos) 
    second_colon_pos = item.find(" : ", second_comma_pos) 
    print second_colon_pos 
    item = (item[:first_colon_pos+3] + 
     '"' + item[first_colon_pos+3:second_comma_pos] + '"' + 
     item[second_comma_pos:second_colon_pos+3] + 
     '"' + item[second_colon_pos+3:-1] + '"\n') 
    print item 
    saveFile= open("result.txt", "a") 
    saveFile.write(item) 
    saveFile.write('\n') 
    saveFile.close() 
+2

......,问题是......? – 2015-01-26 19:22:01

+3

你的问题有两个问题:1)你没有说明问题是什么,2)这可能是一个[XY问题](http://meta.stackexchange.com/questions/66377/what-is-the -xy-problem)的问题。 – Roberto 2015-01-26 19:22:33

+0

更新了这个问题,我没有得到任何错误,但是我的代码将倒​​置的引号放在错误的位置,如问题中所示。 – 2015-01-26 19:26:56

回答

2

你是相当准确的,但2个缺陷: -

  • 你,你没有额外增加了指数
  • 用于查找第一个逗号本身的位置find你结束"是你{之外。因此,曾经被扔出去的地方

编辑的代码

f = open("textfile.txt", "r") 
for item in f: 
    first_comma_pos = item.find(",") 
    print item 
    print first_comma_pos 
    first_colon_pos = item.find(" : ") 
    print first_colon_pos 
    second_comma_pos = item.find(",", first_comma_pos+1) # Note change 
    second_colon_pos = item.find(" : ", second_comma_pos) 
    print second_colon_pos 
    item = (item[:first_colon_pos+3] + 
     '"' + item[first_colon_pos+3:second_comma_pos] + '"' + 
     item[second_comma_pos:second_colon_pos+3] + 
     '"' + item[second_colon_pos+3:-2] + '"}\n') # Note change 
    print item 
    saveFile= open("result.txt", "a") 
    saveFile.write(item) 
    saveFile.write('\n') 
    saveFile.close() 

输出

{created_at: “2014年7月07,” 文章:“土耳其政府已经绘制的被取缔的库尔德工人党的武装分子的路线图,他们拿起武器对土耳其国家,以便在土耳其东南部划出一个单独的州。“}

+0

帮助纠正他的脚本的好工作...可能更具教育价值,那么我的答案(+1) – 2015-01-26 19:43:42

2

漂亮哈克但

fix_json.py

import re,json 
s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}""" 
parts0 = s.split(":") 
data = {} 
for lhs,rhs in zip(parts0,parts0[1:]): 
    #: assume that the word directly preceding the ":" is the key 
    #: word defined by regex below 
    key = re.sub("[^a-zA-Z_]","",lhs.rsplit(",",1)[-1]) 
    value = rhs.rsplit(",",1)[0] 
    data[key] = value 

print json.dumps(data) 

.. 。以及根据您的示例对您的数据做出一些假设

+1

这个解决方案让我非常难过。 – 2015-01-26 19:37:18

+2

我不同意......但给了提供的信息,我认为这是OP正在寻找....真正的答案是调整其他脚本输出有效数据(json或任何其他序列化的数据),而不是编写自己的序列化例程 – 2015-01-26 19:38:56

+2

@JonKiparsky不要感到难过,这里是[something](http://xkcd.com/)为你加油! – 2015-01-26 19:41:28

2

如果数据始终是格式,可以从右边记号化的点点滴滴,如:

s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}""" 

created_at, a_sep, article_text = s.strip('{}').rpartition('article :') 
start, c_sep, created_date = created_at.rpartition('created_at :') 
new_string = '{{{} "{}", {} "{}"}}'.format(
    c_sep, 
    created_date.strip(' ,'), 
    a_sep, 
    article_text.strip() 
) 

# {created_at : "July 07, 2014", article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey."}