如何使用python跳过多个标题行

我是python的新手。试图编写一个脚本，该脚本将使用来自文件的数字变体，其中还包含一个标题。这里是一个文件的例子：如何使用python跳过多个标题行

@File_Version: 4 
PROJECTED_COORDINATE_SYSTEM 
#File_Version____________-> 4 
#Master_Project_______-> 
#Coordinate_type_________-> 1 
#Horizon_name____________-> 
sb+ 
#Horizon_attribute_______-> STRUCTURE 
474457.83994 6761013.11978 
474482.83750 6761012.77069 
474507.83506 6761012.42160 
474532.83262 6761012.07251 
474557.83018 6761011.72342 
474582.82774 6761011.37433 
474607.82530 6761011.02524

我想跳过标题。这是我的尝试。当然，如果我知道哪些字符会出现在标题中，如“＃”和“@”，它是有效的。但是，我怎样才能跳过包含任何字母字符的所有行？

in_file1 = open(input_file1_short, 'r') 
out_file1 = open(output_file1_short,"w") 
lines = in_file1.readlines() 
x = [] 
y = [] 
for line in lines: 
    if "#" not in line and "@" not in line: 
     strip_line = line.strip() 
     replace_split = re.split(r'[ ,|;"\t]+', strip_line) 
     x = (replace_split[0]) 
     y = (replace_split[1]) 
     out_file1.write("%s\t%s\n" % (str(x),str(y))) 
in_file1.close()

非常感谢！

来源

2015-11-05 emin

你可以简单地检查前导字符，还是比你的头部检测更普遍？如果你可以在前面有数字，但后来得到的话，那么也许我可以给你写一个简化函数。 – Prune

这将检查每一行的第一个字符，并跳过不以数字开头的所有行：

for line in lines: 
    if line[0].isdigit(): 
     # we've got a line starting with a digit

来源

2015-11-05 20:35:52 razzak

使用发生器管道过滤您的输入流。这需要从原始输入行中的行，但停下来检查整行中是否有字母。

input_stream = (line in lines if 
       reduce((lambda x, y: (not y.isalpha()) and x), line, True)) 

for line in input_stream: 
    strip_line = ...

来源

2015-11-05 20:46:26 Prune

我想你可以使用一些内置插件是这样的：

import string 
for line in lines: 
    if any([letter in line for letter in string.ascii_letters]): 
     print "there is an ascii letter somewhere in this line"

这只是寻找ASCII字母，但是。

你还可以：

import unicodedata 
for line in lines: 
    if any([unicodedata.category(unicode(letter)).startswith('L') for letter in line]): 
     print "there is a unicode letter somewhere in this line"

，但只有当我正确地理解我的Unicode类别....

即使清洁（使用来自其他答案建议，因此既适用于Unicode的行和字符串）。：

for line in lines: 
    if any([letter.isalpha() for letter in line]): 
     print "there is a letter somewhere in this line"

但是，有趣的是，如果你这样做：

在[57]：U '\ u2161'.isdecimal（）

缺货[57]：假

在[58]：U' \ u2161'.isdigit（）

缺货[58]：假

在[59]：U'\ u2161'.isalpha（）

缺货[59]：假

Unicode的FO r罗马数字“Two”不是这些，，但unicodedata.category（u'\ u2161'）确实返回表示数字的'N1'（并且u'\ u2161'.isnumeric（）为True）。

来源

2015-11-05 20:49:31 rkh

非常感谢您的建议。非常感激！它通过省略包含字母的行使用.isalpha – emin

如何使用python跳过多个标题行

回答

相关问题