Python：如何从文本文件中提取字符串以用作数据

这是我第一次编写python脚本，并且我在入门时遇到了一些问题。假设我有一个名为Test.txt的包含此信息的txt文件。Python：如何从文本文件中提取字符串以用作数据

        x   y   z  Type of atom 
ATOM 1  C1 GLN D 10  26.395  3.904  4.923 C 
ATOM 2  O1 GLN D 10  26.431  2.638  5.002 O 
ATOM 3  O2 GLN D 10  26.085  4.471  3.796 O 
ATOM 4  C2 GLN D 10  26.642  4.743  6.148 C

我想要做的是最终编写一个脚本，它将找到这三个原子的质心。所以基本上我想总结一下txt文件中的所有x值，每个数字乘以给定的值，这取决于原子的类型。

我知道我需要为每个x值定义位置，但是我很难弄清楚如何使这些x值表示为数字而不是字符串中的txt。我必须记住，我需要将这些数字乘以原子类型，所以我需要一种方法来为每种原子类型定义它们。任何人都可以把我推向正确的方向吗？

来源

2012-07-27 Cammen

首先，这是一个功课吗？ – FallenAngel 2012-07-27 15:30:39

欢迎来到SO！你能向我们展示迄今为止的代码吗？如果你有读取文件的代码并获得'x'值作为字符串，那么这是一个很好的开始！基本上，如果你告诉我们你有什么，我们可以帮助你改进它，并让它达到你可以使用它的地步。 – mgilson 2012-07-27 15:30:43

这是从您的软件作为制表符分隔文件？如果是这样，你可以看看http://docs.python.org/library/csv.html – Jzl5325 2012-07-27 15:30:49

基本上在Python中使用open函数可以打开任何文件。所以你可以做如下事情：---下面的片段不是整个问题的解决方案，而是一种方法。

def read_file(): 
    f = open("filename", 'r') 
    for line in f: 
     line_list = line.split() 
     .... 
     .... 
    f.close()

从这一点上，你可以很好地设置你可以用这些值做什么。基本上第二行只是打开文件供阅读。第三行定义了一个for循环，该循环一次读取一行文件，每行写入line变量。

该代码段中的最后一行基本上将字符串 - 在每个whitepsace中 - 分解成一个列表。所以line_list [0]将会是你第一列的值等等。从这一点来说，如果你有任何编程经验，你可以使用if语句等来获得你想要的逻辑。

**请记住，存储在该列表中的值的类型将全部为字符串，因此如果您想执行任何算术运算（如添加），您必须非常小心。

*被修改的语法校正

来源

2012-07-27 15:37:55

您应该使用术语'list'而不是'array'。此外，你永远不会调用'f.close（）'（注意，这种事情正是'with'语句被设计为更容易处理的东西）。 – mgilson 2012-07-27 15:39:19

@mgilson你是对的。请参阅我的编辑。 – 2012-07-27 15:43:58

我不是'拥有'的粉丝，但你应该熟悉它。然而，从不使用'file.close（）'的建议是不好的..很多时候最好这样处理它。 – ely 2012-07-27 15:49:41

mass_dictionary = {'C':12.0107, 
        'O':15.999 
        #Others...? 
        } 

# If your files are this structured, you can just 
# hardcode some column assumptions. 
coords_idxs = [6,7,8] 
type_idx = 9 

# Open file, get lines, close file. 
# Probably prudent to add try-except here for bad file names. 
f_open = open("Test.txt",'r') 
lines = f_open.readlines() 
f_open.close() 

# Initialize an array to hold needed intermediate data. 
output_coms = []; total_mass = 0.0; 

# Loop through the lines of the file. 
for line in lines: 

    # Split the line on white space. 
    line_stuff = line.split() 

    # If the line is empty or fails to start with 'ATOM', skip it. 
    if (not line_stuff) or (not line_stuff[0]=='ATOM'): 
     pass 

    # Otherwise, append the mass-weighted coordinates to a list and increment total mass. 
    else: 
     output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs]) 
     total_mass = total_mass + mass_dictionary[line_stuff[type_idx]] 

# After getting all the data, finish off the averages. 
avg_x, avg_y, avg_z = tuple(map(lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]])) 


# A lot of this will be better with NumPy arrays if you'll be using this often or on 
# larger files. Python Pandas might be an even better option if you want to just 
# store the file data and play with it in Python.

来源

2012-07-27 15:48:25 ely

'line_stuff = line.replace（“\ n”，“”）。split（）' - 这相当于'line_stuff.split（）'。 – mgilson 2012-07-27 15:50:47

当我使用'split（）'时，我经常会在我的东西后面出现“\ n”。我认为这取决于行格式是否有效，我只是总觉得包括谨慎。 – ely 2012-07-27 15:51:37

你使用'split（''）'？这可能会导致尾随的换行符，但不是'split（）' – mgilson 2012-07-27 15:54:33

如果已安装pandas，检出read_fwf函数输入一个固定的宽度的文件，并创建一个数据帧（2-d的表格数据结构）。它可以在导入时节省您的代码行，并且如果您想进行任何额外的数据操作，还可以为您提供大量的数据管理功能。

来源

2012-07-28 04:53:36

Python：如何从文本文件中提取字符串以用作数据

回答

相关问题