不寻常的表格解析

如何解析这种类型的表格？不寻常的表格解析

https://primes.utm.edu/lists/small/10000.txt

     The First 10,000 Primes 
        (the 10,000th is 104,729) 
    For more information on primes see http://primes.utm.edu/ 

    2  3  5  7  11  13  17  19  23  29 
31  37  41  43  47  53  59  61  67  71 
73  79  83  89  97 101 103 107 109 113

这些都不是逗号分隔的或XML结构化的数字。你知道有什么方法可以将它们读入列表中吗？

来源

2016-04-25 Nesa

在未来拆分，请包括从外部参考网站以及链接到原来的一些示例数据 - 如在编辑的版本中。（我注意到在每条线的起点有4个额外的空间被SO的MarkDown处理吞噬）。 –

只需知道数据在第四行开始，并在结束之前结束一行，就可以解析表的结构。此外，整个表具有整数内容。例如：

# Using the requests HTTP client library 
    import requests 
    # Get data from HTTP request 
    data = requests.get("http://primes.utm.edu/lists/small/10000.txt").text 
    # Nested list comprehension: Split data into lines, consider from fourth line to second last, then split those lines into columns which will be evaluated as integers. 
    [[int(e) for e in l.strip().split()] for l in data.split('\n')[4:-2]]

Voilà。

这工作，因为隐式分割法将在空格如制表符，空格集团等

来源

2016-04-25 20:07:43

不寻常的表格解析

回答

相关问题