2014-11-02 73 views
0

在代码中有一个名为clean_up的助手函数,下面是我的代码。我想知道我需要修复,添加或删除它以使其工作。代码运行但不符合合同中的前提条件

def clean_up(s): 
    """ (str) -> str 

    Return a new string based on s in which all letters have been 
    converted to lowercase and punctuation characters have been stripped 
    from both ends. Inner punctuation is left untouched. 

    >>> clean_up('Happy Birthday!!!') 
    'happy birthday' 
    >>> clean_up("-> It's on your left-hand side.") 
    " it's on your left-hand side" 
    """ 

    punctuation = """!"',;:.-?)([]<>*#\n\t\r""" 
    result = s.lower().strip(punctuation) 
    return result 


########## Complete the following functions. ############ 

def type_token_ratio(text): 
    """ (list of str) -> float 

    Precondition: text is non-empty. Each str in text ends with \n and 
    text contains at least one word. 

    Return the Type Token Ratio (TTR) for this text. TTR is the number of 
    different words divided by the total number of words. 

    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 
     'James Gosling\n'] 
    >>> type_token_ratio(text) 
    0.8888888888888888 
    """ 

    # To do: Fill in this function's body to meet its specification. 

    distinctwords = dict() 
    words = 0 
    for line in text.splitlines(): 
     line = line.strip().split() 
     for word in line: 
      words+=1 
      if word in distinctwords: 
       distinctwords[word]+=1 
      else: 
       distinctwords[word]=1 
    TTR= len(distinctwords)/words 
    return TTR 
+0

什么问题? – 2014-11-02 22:52:02

+0

我问我的老师,但他没有向我解释,他说这样做符合前提条件,但我的代码运行,所以我很困惑。 – JerryMichaels 2014-11-02 22:56:41

回答

0

您的代码将不能运行,for line in text.splitlines()试图拆分列表,你需要遍历传递的话叫text的列表,使用collections.defaultdict也将更加高效:

def type_token_ratio(text): 
    from collections import defaultdict 
    distinctwords = defaultdict(int) 
    for words in text: # get each string 
     words = clean_up(words) # clean the string 
     for word in words.split(): # split into individual words 
      distinctwords[word] += 1 # increase the count for each word 
    TTR = len(distinctwords)/sum(distinctwords.values()) # sum(distinctwords.values()) will give total amount of words 
    return TTR 
+0

不用担心,不客气 – 2014-11-02 23:34:07