2017-04-17 64 views
0

我几乎没有任何关于Python中的文件r/w的经验,并且想知道我的特定情况的最佳解决方案是什么。将变量字符串附加到Python文件中的每一行

我有具有以下结构,其中每个句子由空行分隔制表符分隔的文件:

Roundup NN 
: : 
Muslim NNP 
Brotherhood NNP 
vows VBZ 
daily JJ 
protests NNS 
in IN 
Egypt NNP 

Families NNS 
with IN 
no DT 
information NN 
on IN 
the DT 
whereabouts NN 
of IN 
loved VBN 
ones NNS 
are VBP 
grief JJ 
- : 
stricken JJ 
. . 

The DT 
provincial JJ 
departments NNS 
of IN 
supervision NN 
and CC 
environmental JJ 
protection NN 
jointly RB 
announced VBN 
on IN 
May NNP 
9 CD 
that IN 
the DT 
supervisory JJ 
department NN 
will MD 
question VB 
and CC 
criticize VB 
mayors NNS 
who WP 
fail VBP 
to TO 
curb VB 
pollution NN 
. . 

(...) 

我要附加到该文件的非空行,第一标签和然后是给定的字符串。

对于每一行,要追加的字符串取决于下面代码中存储在lab_pred_tags中的值。对于for循环的每次迭代,lab_pred_tags与文本文件中对应句子的行数具有相同的长度。即,在该示例中,lab_pred_tags为3次for循环迭代的长度是9,15,和12。

对于第一for循环迭代,lab_pred_tags包含list['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']

# (...) code to calculate lab_pred 
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths): 
    lab = lab[:length] 
    lab_pred = lab_pred[:length] 
    # Convert lab_pred from a sequence of numbers to a sequence of strings 
    lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags) 
    # Now what is the best solution to append each element of `lab_pred_tags` to each line in the file? 
    # Keep in mind that I will need to skip a line everytime a new for loop iteration is started 

对于例如,所需的输出文件是:

Roundup NN O 
: : O 
Muslim NNP B-ORG 
Brotherhood NNP I-ORG 
vows VBZ O 
daily JJ O 
protests NNS O 
in IN O 
Egypt NNP B-GPE 

Families NNS O 
with IN O 
no DT O 
information NN O 
on IN O 
the DT O 
whereabouts NN O 
of IN O 
loved VBN O 
ones NNS O 
are VBP O 
grief JJ O 
- : O 
stricken JJ O 
. . O 

The DT O 
provincial JJ O 
departments NNS O 
of IN O 
supervision NN O 
and CC O 
environmental JJ O 
protection NN O 
jointly RB O 
announced VBN O 
on IN O 
May NNP O 
9 CD O 
that IN O 
the DT O 
supervisory JJ O 
department NN O 
will MD O 
question VB O 
and CC O 
criticize VB O 
mayors NNS O 
who WP O 
fail VBP O 
to TO O 
curb VB O 
pollution NN O 
. . O 

这是什么最佳解决方案?

+0

您能更具体地了解“lab_pred_tags”吗?我的意思是你想根据行号或根据行中的第二个字符串将它们添加到行吗? –

+0

我根本不想做任何匹配。只需将“lab_pred_tags”元素附加到文件中的连续行。 例如,第一次迭代中的“lab_pred_tags”长度是9,我想将这9个元素附加到前9行。然后跳过一行。然后在第二次迭代中,“lab_pred_tags”的长度是15,并且我希望将元素追加到前一个空格之后的15行。等等 –

回答

1

为了测试目的,我修改了lab_pred_tags列表。这里是我的解决方案:

lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 
        'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 
        'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 
        'O', 'O', 'B-GPE', 'O'] 
    index = 0 

    with open("PATH_TO_YOUR_FILE", "r") as lab_file, \ 
      open("PATH_TO_NEW_FILE", "w") as lab_file_2: 
     lab_file_list = lab_file.readlines() 

     for lab_file_list_element in lab_file_list: 
      if lab_file_list_element != "\n": 
       new_line_element = lab_file_list_element.replace(
        "\n", ' ' + lab_pred_tags[index] + "\n" 
       ) 
       index += 1 
       lab_file_2.write(new_line_element) 
      if lab_file_list_element == "\n": 
       index = 0 
       lab_file_2.write("\n") 
相关问题