是否有人可以帮助我为awk,grep,sed,perl或python中的以下要求编写脚本?根据类似的行标题拆分文件
输入文件“raw.fa”:
>CLocus_1_Sample_61_Locus_1_Allele_0 [JPKM01095229.1, 31450, +]
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>CLocus_1_Sample_67_Locus_1_Allele_0 [JPKM01095229.1, 31450, +]
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>CLocus_1_Sample_107_Locus_1_Allele_0 [JPKM01095229.1, 31450, +]
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>CLocus_1_Sample_107_Locus_1_Allele_1 [JPKM01095229.1, 31450, +]
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGGTTGAAG
>CLocus_41_Sample_158_Locus_53_Allele_0 [JPKM01105094.1, 1700, +]
TGCAGGTTATCCAGCTCTATTCTGCACTGGCCATCGTACCAAATAGCAGGAGGGT
>CLocus_41_Sample_159_Locus_31_Allele_0 [JPKM01105094.1, 1700, +]
TGCAGGTTATCCAGCTCTATTCTGCACTGGCCATCGTACCAAATAGCAGGAGGGT
>CLocus_86_Sample_161_Locus_103_Allele_0 [JPKM01106288.1, 770, -]
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACACACTTGGCTCCCATTGGGATGACTCCTTT
>CLocus_86_Sample_164_Locus_98_Allele_0 [JPKM01106288.1, 770, -]
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACACACTTGGCTCCCATTGGGATGACTCCTTT
>CLocus_86_Sample_166_Locus_110_Allele_0 [JPKM01106288.1, 770, -]
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACTCACTTGGCTCCCATTGGGATGACTCCTTT
>CLocus_86_Sample_167_Locus_123_Allele_0 [JPKM01106288.1, 770, -]
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACTCACTTGGCTCCCATTGGGATGACTCCTTT
我想通过轨迹分割上述文件,每个基因座1个文件,保持DNA(第二行)和样品#从第一行,三个产分别.fa文件:
“locus1.fa”:
>Sample_61
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>Sample_67
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>Sample_107
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGATTGGAG
>Sample_107
TGCAGGTGTGTTCTGCAGATCCAAACACAAAGAGGCAGGGGTTGAAG
“locus41.fa”:
>Sample_158
TGCAGGTTATCCAGCTCTATTCTGCACTGGCCATCGTACCAAATAGCAGGAGGGT
>Sample_159
TGCAGGTTATCCAGCTCTATTCTGCACTGGCCATCGTACCAAATAGCAGGAGGGT
“locus86.fa”:
>Sample_161
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACACACTTGGCTCCCATTGGGATGACTCCTTT
>Sample_164
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACACACTTGGCTCCCATTGGGATGACTCCTTT
>Sample_166
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACTCACTTGGCTCCCATTGGGATGACTCCTTT
>Sample_167
TGCAGGGAACCGTGCTCAGCTCTGGAGTATTCCCACTCACTTGGCTCCCATTGGGATGACTCCTTT
感谢您的帮助!我发现awk代码首次出现分裂,但不是如何拆分相似行的分组(例如,所有具有locus86头部和第二行DNA序列的行)。
克里斯·马丁
你有任何编程语言编写的任何代码来尝试自行解决问题呢? – 2014-10-11 21:04:09