2016-06-12 70 views
0

我想将字符串列表转换为小写字母并删除重复项,同时保留顺序。我在StackOverflow上找到的很多单行Python魔法将字符串列表转换为小写字母,但似乎命令丢失了。将字符串列表转换为唯一的小写,保留顺序(python 2.7)

我已经写了下面的代码实际工作,我很高兴坚持下去。但是我想知道是否有一种方法可以实现更多的pythonic和更少的代码(如果我将来编写类似的东西,可能会出现更少的bug,这让我花了很长时间才写出来)。

def word_list_to_lower(words): 
    """ takes a word list with a special order (e.g. frequency) 
    returns a new word list all in lower case with no uniques but preserving order""" 

    print("word_list_to_lower")  
    # save orders in a dict 
    orders = dict() 
    for i in range(len(words)): 
     wl = words[i].lower() 

     # save index of first occurence of the word (prioritizing top value)   
     if wl not in orders: 
      orders[wl] = i 

    # contains unique lower case words, but in wrong order 
    words_unique = list(set(map(str.lower, words))) 

    # reconstruct sparse list in correct order 
    words_lower = [''] * len(words) 
    for w in words_unique: 
     i = orders[w] 
     words_lower[i] = w 

    # remove blank entries 
    words_lower = [s for s in words_lower if s!=''] 

    return words_lower 

回答

1

略有How do you remove duplicates from a list in whilst preserving order?

def f7(seq): 
    seen = set() 
    seen_add = seen.add 
    seq = (x.lower() for x in seq) 
    return [x for x in seq if not (x in seen or seen_add(x))] 
+0

哇,谢谢,太棒了。 seen_add也有趣的见解 – memo

+0

'seen_add(...)'比'seen.add(...)'更好吗?国际海事组织,情况更糟。 – zondo

+0

如果在定义'seq'时使用了括号'()'而不是括号'[]',它会更高效一些。这是因为您创建了一个按需提供值的生成器,而不是需要将每个值存储在内存中的列表。 – zondo

0

修改答案就在做这样的事情:

initial_list = ['ONE','one','TWO','two'] 
uninique_list = [x.lower() for x in list(set(initial_list))] 

print unique_list 
+0

问题的关键之一是必须保留订单。您的解决方案不会保留订单。 – zondo

1

你也可以这样做:

pip install orderedset 

,然后:

from orderedset import OrderedSet 
initial_list = ['ONE','one','TWO','two','THREE','three'] 
unique_list = [x.lower() for x in list(OrderedSet(initial_list))] 

print unique_list 
0
initial_list = ['ONE','one','TWO','two'] 
new_list = [] 
[new_list.append(s.lower()) for s in initial_list if s.lower() not in new_list]