开幕式，并与蟒蛇

我想从这个修改我.fasta文件编辑文件夹中的多个文件：开幕式，并与蟒蛇

>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3] 
MSNVLLKQ... 

>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1] 
MRTPSKSE... 

>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2] 
MMNSDAVI...

这样：

>Achromobacter phage phiAxp-3 
MSNVLLKQ... 

>Achromobacter phage phiAxp-1 
MRTPSKSE... 

>Achromobacter phage phiAxp-2 
MMNSDAVI...

现在，我已经有一个脚本，可以做到一个单一的文件：

with open('Achromobacter.fasta', 'r') as fasta_file: 
    out_file = open('./fastas3/Achromobacter.fasta', 'w') 
    for line in fasta_file: 
     line = line.rstrip() 
     if '[' in line: 
      line = line.split('[')[-1] 
      out_file.write('>' + line[:-1] + "\n") 
     else: 
      out_file.write(str(line) + "\n")

但我不能自动化过程中的所有120个文件在我的文件夹。

我使用glob.glob试过，但我似乎无法使其工作：

import glob 

for fasta_file in glob.glob('*.fasta'): 
    outfile = open('./fastas3/'+fasta_file, 'w') 
    with open(fasta_file, 'r'): 
     for line in fasta_file: 
      line = line.rstrip() 
      if '[' in line: 
       line2 = line.split('[')[-1] 
       outfile.write('>' + line2[:-1] + "\n") 
      else: 
       outfile.write(str(line) + "\n")

它给了我这样的输出：

A 
c 
i 
n 
e 
t 
o 
b 
a 
c 
t 
e 
r 
. 
f 
a 
s 
t 
a

我设法让所有的列表文件夹中的文件，但无法使用列表中的对象打开某些文件。

import os 


file_list = [] 
for file in os.listdir("./fastas2/"): 
    if file.endswith(".fasta"): 
     file_list.append(file)

来源

2017-08-01 tahunami

在第二代码片段，你迭代的文件名，而不是文件：'在fasta_file'线。您需要在'with'语句中给文件对象一个名称。 –

考虑到您现在可以更改文件名的内容，您需要自动执行此过程。我们通过删除文件处理程序来更改一个文件的功能。

def file_changer(filename): 
    data_to_put = '' 
    with open(filename, 'r+') as fasta_file: 
     for line in fasta_file.readlines(): 
      line = line.rstrip() 
      if '[' in line: 
       line = line.split('[')[-1] 
       data_to_put += '>' + str(line[:-1]) + "\n" 
      else: 
       data_to_put += str(line) + "\n" 
     fasta_file.write(data_to_put) 
     fasta_file.close()

现在，我们需要遍历所有的文件。因此，让使用glob模块，它

import glob 
for file in glob.glob('*.fasta'): 
    file_changer(file)

来源

2017-08-01 10:04:35

我有这个错误：'TypeError：强制转换为Unicode：需要字符串或缓冲区，找到类型' – tahunami

@tahunami在哪一行？ –

'第20行，在 file_changer（file）'和'第5行，在file_changer 中打开（文件名，'w'）as fasta_file：' – tahunami

你迭代的文件名，它给你的名称，而不是文件的行中的所有字符。下面是代码的一个修正版本：

import glob 

for fasta_file_name in glob.glob('*.fasta'): 
    with open(fasta_file_name, 'r') as fasta_file, \ 
      open('./fastas3/' + fasta_file_name, 'w') as outfile: 
     for line in fasta_file: 
      line = line.rstrip() 
      if '[' in line: 
       line2 = line.split('[')[-1] 
       outfile.write('>' + line2[:-1] + "\n") 
      else: 
       outfile.write(str(line) + "\n")

作为替代的Python脚本，你可以简单地使用sed命令行：

sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta

这将修改到位的所有文件，所以考虑先复制它们。

来源

2017-08-01 09:56:49

你可以告诉我更多关于该行语法的信息： 'opena（fasta_file_name，'r'）as fasta_file，\ open（'./fastas3/'+ fasta_file_name，'w'）as outfile：' – tahunami

@tahunami所有文件都应该在'with'语句中打开以确保它们被正确关闭。您可以在单个'with'语句中打开多个文件，并且反斜杠仅用于续行。 –

开幕式，并与蟒蛇

回答

相关问题