Python的排序问题 - 给出的列表[ '网址'， '标签1'， '标签2'，..] S和搜索规范[ '标签3'， '标签1'，...]，返回相关的URL列表

我m对于编程来说相当新颖，所以我确信有一种更好的方式来构成这个问题，但我正在尝试创建一个个人书签程序。给定多个网址，每个网址都有一个按相关性排序的标签列表，我希望能够创建一个由一系列标签组成的搜索，这些标签返回最相关的url列表。下面我的第一个解决方案是让第一个标签的值为1，第二个为&，让python list sort函数完成剩下的工作。 2个问题：Python的排序问题 - 给出的列表[ '网址'， '标签1'， '标签2'，..] S和搜索规范[ '标签3'， '标签1'，...]，返回相关的URL列表

1）是否有更加优雅/有效的方式来做到这一点（令我难堪！） 2）通过给定上述输入问题的相关性排序的任何其他一般方法？

非常感激。

# Given a list of saved urls each with a corresponding user-generated taglist 
# (ordered by relevance), the user enters a "search" list-of-tags, and is 
# returned a sorted list of urls. 

# Generate sample "content" linked-list-dictionary. The rationale is to 
# be able to add things like 'title' etc at later stages and to 
# treat each url/note as in independent entity. But a single dictionary 
# approach like "note['url1']=['b','a','c','d']" might work better? 

content = [] 
note = {'url':'url1', 'taglist':['b','a','c','d']} 
content.append(note) 
note = {'url':'url2', 'taglist':['c','a','b','d']} 
content.append(note) 
note = {'url':'url3', 'taglist':['a','b','c','d']} 
content.append(note) 
note = {'url':'url4', 'taglist':['a','b','d','c']} 
content.append(note) 
note = {'url':'url5', 'taglist':['d','a','c','b']} 
content.append(note) 

# An example search term of tags, ordered by importance 
# I'm using a dictionary with an ordinal number system 
# This seems clumsy 
search = {'d':1,'a':2,'b':3} 

# Create a tagCloud with one entry for each tag that occurs 
tagCloud = [] 
for note in content: 
    for tag in note['taglist']: 
     if tagCloud.count(tag) == 0: 
      tagCloud.append(tag) 

# Create a dictionary that associates an integer value denoting 
# relevance (1 is most relevant etc) for each existing tag 

d={}    
for tag in tagCloud: 
    try: 
     d[tag]=search[tag] 
    except KeyError: 
     d[tag]=100 

# Create a [[relevance, tag],[],[],...] result list & sort 
result=[]  
for note in content: 
    resultNote=[] 
    for tag in note['taglist']: 
     resultNote.append([d[tag],tag]) 
    resultNote.append(note['url']) 
    result.append(resultNote) 
result.sort() 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
# Its so hacky i've forgotten how it works! 
# It's mostly for display, but suggestions on "best-practice" 
# intermediate-form data storage? 

finalResult=[] 
for note in result: 
    temp=[] 
    temp.append(note.pop()) 
    for tag in note: 
     temp.append(tag[1]) 
    finalResult.append(temp) 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult

来源

2010-12-12 John Smith

1）是否有这样做的更优雅/有效的方式（让我难堪！）

当然可以。基本思路：不要试图告诉Python该做什么，只要问它想要什么。

content = [ 
    {'url':'url1', 'taglist':['b','a','c','d']}, 
    {'url':'url2', 'taglist':['c','a','b','d']}, 
    {'url':'url3', 'taglist':['a','b','c','d']}, 
    {'url':'url4', 'taglist':['a','b','d','c']}, 
    {'url':'url5', 'taglist':['d','a','c','b']} 
] 

search = {'d' : 1, 'a' : 2, 'b' : 3} 

# We can create the tag cloud like this: 
# tagCloud = set(sum((note['taglist'] for note in content), [])) 
# But we don't actually need it: instead, we'll just use a default value 
# when looking things up in the 'search' dict. 

# Create a [[relevance, tag],[],[],...] result list & sort 
result = sorted(
    [ 
     [search.get(tag, 100), tag] 
     for tag in note['taglist'] 
    ] + [[note['url']]] 
    # The result will look like [ [relevance, tag],... , [url] ] 
    # Note that the url is wrapped in a list too. This makes the 
    # last processing step easier: we just take the last element of 
    # each nested list. 
    for note in content 
) 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
finalResult = [ 
    [x[-1] for x in note] 
    for note in result 
] 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult

来源

2010-12-12 01:50:01

很好，谢谢。解释性评论有很大帮助。欢呼声 – 2010-12-12 04:13:06

@大卫：如果答案是根据你的要求，认为它是礼貌的，并接受它。 – user225312 2010-12-12 04:49:28

哈哈是的，它不会让我upvote＆我错过了透明的小勾选框。 – 2010-12-12 05:06:28

我建议你也给一个权重给每个标签，这取决于它是多么难得的（例如，“狼蛛”标签将重量超过“自然”tag¹更多）。对于给定的URL，罕见的标记，是常见的其他网址应当标注较强的相关性，而给定的URL 不存在于另一网址经常使用的标签应当标注下的相关性。

可以很容易地转换我上面描述的作为每隔URL数值相关的计算规则。

¹除非您的所有网址都与“狼蛛”相关，当然:)

来源

2010-12-14 13:47:46 tzot

是的，有趣的做法。干杯 – 2010-12-14 19:53:49

Python的排序问题 - 给出的列表[ '网址'， '标签1'， '标签2'，..] S和搜索规范[ '标签3'， '标签1'，...]，返回相关的URL列表

回答

相关问题