2010-12-12 65 views
1

我m对于编程来说相当新颖,所以我确信有一种更好的方式来构成这个问题,但我正在尝试创建一个个人书签程序。给定多个网址,每个网址都有一个按相关性排序的标签列表,我希望能够创建一个由一系列标签组成的搜索,这些标签返回最相关的url列表。下面我的第一个解决方案是让第一个标签的值为1,第二个为&,让python list sort函数完成剩下的工作。 2个问题:Python的排序问题 - 给出的列表[ '网址', '标签1', '标签2',..] S和搜索规范[ '标签3', '标签1',...],返回相关的URL列表

1)是否有更加优雅/有效的方式来做到这一点(令我难堪!) 2)通过给定上述输入问题的相关性排序的任何其他一般方法?

非常感激。

# Given a list of saved urls each with a corresponding user-generated taglist 
# (ordered by relevance), the user enters a "search" list-of-tags, and is 
# returned a sorted list of urls. 

# Generate sample "content" linked-list-dictionary. The rationale is to 
# be able to add things like 'title' etc at later stages and to 
# treat each url/note as in independent entity. But a single dictionary 
# approach like "note['url1']=['b','a','c','d']" might work better? 

content = [] 
note = {'url':'url1', 'taglist':['b','a','c','d']} 
content.append(note) 
note = {'url':'url2', 'taglist':['c','a','b','d']} 
content.append(note) 
note = {'url':'url3', 'taglist':['a','b','c','d']} 
content.append(note) 
note = {'url':'url4', 'taglist':['a','b','d','c']} 
content.append(note) 
note = {'url':'url5', 'taglist':['d','a','c','b']} 
content.append(note) 

# An example search term of tags, ordered by importance 
# I'm using a dictionary with an ordinal number system 
# This seems clumsy 
search = {'d':1,'a':2,'b':3} 

# Create a tagCloud with one entry for each tag that occurs 
tagCloud = [] 
for note in content: 
    for tag in note['taglist']: 
     if tagCloud.count(tag) == 0: 
      tagCloud.append(tag) 

# Create a dictionary that associates an integer value denoting 
# relevance (1 is most relevant etc) for each existing tag 

d={}    
for tag in tagCloud: 
    try: 
     d[tag]=search[tag] 
    except KeyError: 
     d[tag]=100 

# Create a [[relevance, tag],[],[],...] result list & sort 
result=[]  
for note in content: 
    resultNote=[] 
    for tag in note['taglist']: 
     resultNote.append([d[tag],tag]) 
    resultNote.append(note['url']) 
    result.append(resultNote) 
result.sort() 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
# Its so hacky i've forgotten how it works! 
# It's mostly for display, but suggestions on "best-practice" 
# intermediate-form data storage? 

finalResult=[] 
for note in result: 
    temp=[] 
    temp.append(note.pop()) 
    for tag in note: 
     temp.append(tag[1]) 
    finalResult.append(temp) 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult 

回答

2

1)是否有这样做的更优雅/有效的方式(让我难堪!)

当然可以。基本思路:不要试图告诉Python该做什么,只要问它想要什么。

content = [ 
    {'url':'url1', 'taglist':['b','a','c','d']}, 
    {'url':'url2', 'taglist':['c','a','b','d']}, 
    {'url':'url3', 'taglist':['a','b','c','d']}, 
    {'url':'url4', 'taglist':['a','b','d','c']}, 
    {'url':'url5', 'taglist':['d','a','c','b']} 
] 

search = {'d' : 1, 'a' : 2, 'b' : 3} 

# We can create the tag cloud like this: 
# tagCloud = set(sum((note['taglist'] for note in content), [])) 
# But we don't actually need it: instead, we'll just use a default value 
# when looking things up in the 'search' dict. 

# Create a [[relevance, tag],[],[],...] result list & sort 
result = sorted(
    [ 
     [search.get(tag, 100), tag] 
     for tag in note['taglist'] 
    ] + [[note['url']]] 
    # The result will look like [ [relevance, tag],... , [url] ] 
    # Note that the url is wrapped in a list too. This makes the 
    # last processing step easier: we just take the last element of 
    # each nested list. 
    for note in content 
) 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
finalResult = [ 
    [x[-1] for x in note] 
    for note in result 
] 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult 
+0

很好,谢谢。解释性评论有很大帮助。欢呼声 – 2010-12-12 04:13:06

+0

@大卫:如果答案是根据你的要求,认为它是礼貌的,并接受它。 – user225312 2010-12-12 04:49:28

+0

哈哈是的,它不会让我upvote&我错过了透明的小勾选框。 – 2010-12-12 05:06:28

0

我建议你也给一个权重给每个标签,这取决于它是多么难得的(例如,“狼蛛”标签将重量超过“自然”tag¹更多)。对于给定的URL,罕见的标记,是常见的其他网址应当标注较强的相关性,而给定的URL 存在于另一网址经常使用的标签应当标注的相关性。

可以很容易地转换我上面描述的作为每隔URL数值相关的计算规则。

¹除非您的所有网址都与“狼蛛”相关,当然:)

+0

是的,有趣的做法。干杯 – 2010-12-14 19:53:49