2017-04-07 56 views
0

我必须将文件压缩成单词列表和位置列表才能重新创建原始文件。我的程序还应该能够获取压缩文件并重新创建原始文件的全文,包括标点符号和大写字母。除了娱乐之外,我有一切正确的东西,使用地图功能,我的程序不能将我的位置列表转换为浮点数,因为'['是列表。如何使用'.join'函数将列表转换为float?

我的代码是:

text = open("speech.txt") 
CharactersUnique = [] 
ListOfPositions = [] 
DownLine = False 

while True: 
    line = text.readline() 
    if not line: 
     break 

    TwoList = line.split() 
    for word in TwoList: 
     if word not in CharactersUnique: 
      CharactersUnique.append(word) 

     ListOfPositions.append(CharactersUnique.index(word)) 
    if not DownLine: 
     CharactersUnique.append("\n") 
     DownLine = True 
    ListOfPositions.append(CharactersUnique.index("\n")) 

w = open("List_WordsPos.txt", "w") 
for c in CharactersUnique: 
    w.write(c) 
w.close() 

x = open("List_WordsPos.txt", "a") 
x.write(str(ListOfPositions)) 
x.close() 

with open("List_WordsPos.txt", "r") as f: 
    NewWordsUnique = f.readline() 
    f.close() 

h = open("List_WordsPos.txt", "r") 
lines = h.readlines() 
NewListOfPositions = lines[1] 

NewListOfPositions = map(float, NewListOfPositions) 

print("Recreated Text:\n") 
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions)) 
print(recreation) 

我得到的错误是:

Task 3 Code.py", line 42, in <genexpr> 
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions)) 
ValueError: could not convert string to float: '[' 

我使用Python IDLE 3.5(32位)。有没有人有任何想法如何解决这个问题?

+0

'NewListOfPositions'是一个'map'对象,将'lines [1]'中的每个字符都转换为一个float。当你尝试迭代它时,它会在到达不能转换为'float'的东西时出错。 'line [1]'显然包含了字符'[',它不能被转换为浮点数。 – khelwood

+0

为什么你将值和位置写入文件并在之后立即再次读取它们?只需使用您的原始数据!另外:你不能使用float作为列表的索引。 – Wombatz

+0

有两件事情,第一个'VariableNamesLikeThis'通常是为Python中的类保留的,并且在你对问题的描述和你的代码之间存在不匹配:你说'单词和位置列表',但是你的代码试图将它分解为*字符*。 –

回答

0

为什么要将list中的位置值变成浮动状态,因为它们的list指数和那些必须是整数?我怀疑这可能是所谓XY Problem的一个实例。

我还发现你的代码难以理解,因为你没有遵循PEP 8 - Style Guide for Python Code。尤其是,有多少(尽管不是全部)变量名称是CamelCased,根据指南,应该为类名保留。

此外,您的一些变量具有误导性名称,如CharactersUnique,其实际上[主要]包含独特词

因此,我做的第一件事之一是将所有的CamelCased变量转换为小写的下划线分隔的单词,如camel_case。在一些情况下,我还给他们更好的名字,以反映他们的实际内容或角色:例如:CharactersUnique成为unique_words

下一步是通过使用Python的with语句来改进文件的处理,以确保它们在块的结尾处自动关闭。在其他情况下,我将多个文件open()调用合并为一个。

毕竟我有它几乎工作,但那是当我发现与输入文本文件的单独字对待换行字符的方法的问题。

" ".join(NewWordsUnique[pos] for pos in (NewListOfPositions)) 

,因为它之前和每一个"\n"字符遇到不存在原始文件后面添加一个空格:当文件正在被表达重建这导致的一个问题。为了解决这个问题,我最终编写了for循环来重新创建文件,而不是使用列表理解,因为这样可以正确处理换行符“单词”。

无论如何,这里所产生的改写(和工作)代码:

input_filename = "speech.txt" 
compressed_filename = "List_WordsPos.txt" 

# Two lists to represent contents of input file. 
unique_words = ["\n"] # preload with newline "word" 
word_positions = [] 

with open(input_filename, "r") as input_file: 
    for line in input_file: 
     for word in line.split(): 
      if word not in unique_words: 
       unique_words.append(word) 
      word_positions.append(unique_words.index(word)) 

     word_positions.append(unique_words.index("\n")) # add newline at end of each line 

# Write representations of the two data-structures to compressed file. 
with open(compressed_filename, "w") as compr_file: 
    words_repr = " ".join(repr(word) for word in unique_words) 
    compr_file.write(words_repr + "\n") 
    positions_repr = " ".join(repr(posn) for posn in word_positions) 
    compr_file.write(positions_repr + "\n") 

def strip_quotes(word): 
    """Strip the first and last characters from the string (assumed to be quotes).""" 
    tmp = word[1:-1] 
    return tmp if tmp != "\\n" else "\n" # newline "words" are special case 

# Recreate input file from data in compressed file. 
with open(compressed_filename, "r") as compr_file: 
    line = compr_file.readline() 
    new_unique_words = list(map(strip_quotes, line.split())) 
    line = compr_file.readline() 
    new_word_positions = map(int, line.split()) # using int, not float here 

words = [] 
lines = [] 
for posn in new_word_positions: 
    word = new_unique_words[posn] 
    if word != "\n": 
     words.append(word) 
    else: 
     lines.append(" ".join(words)) 
     words = [] 

print("Recreated Text:\n") 
recreation = "\n".join(lines) 
print(recreation) 

我从你的问题的第一款创建了自己的speech.txt测试文件,并与这些结果跑了它的脚本:

Recreated Text: 

I have to compress a file into a list of words and list of positions to recreate 
the original file. My program should also be able to take a compressed file and 
recreate the full text, including punctuation and capitalization, of the 
original file. I have everything correct apart from the recreation, using the 
map function my program can't convert my list of positions into floats because 
of the '[' as it is a list. 
+0

哇。太棒了。谢谢! – Samer

0

在评论中提问您的问题:

您会希望将输入拆分为空格。你也可能想要使用不同的数据结构。

# we'll map the words to a list of positions 
all_words = {} 
with open("speech.text") as f: 
    data = f.read() 

# since we need to be able to re-create the file, we'll want 
# line breaks 
lines = data.split("\n") 
for i, line in enumerate(lines): 
    words = line.split(" ") 
    for j, word in enumerate(words): 
     if word in all_words: 
      all_words[word].append((i, j)) # line and pos 
     else: 
      all_words[word] = [(i, j)] 

注意,这不会产生最大压缩为foofoo.计数作为单独的单词。如果你想要更多的压缩,你将不得不逐个角色。希望现在您可以根据需要使用类似的方法来做到这一点。

+0

我不知道这可以是改变,以适应我的 – Samer