2016-12-02 50 views
0

我试图从DBPedia检索音乐信息。如果我运行http://dbpedia.org/sparql/此查询:如果使用大于0的偏移量,为什么此SPARQL查询不返回任何数据

SELECT DISTINCT 
?title 
?date 
(group_concat(distinct ?label;separator=";;;") as ?labels) 
(group_concat(distinct ?genre;separator=";;;") as ?genres) 
(group_concat(distinct ?member;separator=";;;") as ?members) 
(group_concat(distinct ?oldMember;separator=";;;") as ?oldMembers) 
(group_concat(distinct ?origin;separator=";;;") as ?origins) 
(group_concat(distinct ?song;separator=";;;") as ?songs) 
(group_concat(distinct ?songOther;separator=";;;") as ?songOthers) 
(group_concat(distinct ?songOtherOther;separator=";;;") as ?songOtherOthers) 
WHERE { 
    ?title <http://purl.org/dc/terms/subject>  <http://dbpedia.org/resource/Category:American_hard_rock_musical_groups> . 
    OPTIONAL { ?title <http://dbpedia.org/ontology/bandMember> ?member . } 
    OPTIONAL { ?title <http://dbpedia.org/ontology/formerBandMember> ?oldMember . } 
    OPTIONAL { ?title <http://dbpedia.org/property/label> ?label . } 
    OPTIONAL { ?title <http://dbpedia.org/property/genre> ?genre . } 
    OPTIONAL { ?title <http://dbpedia.org/property/origin> ?origin . } 
    OPTIONAL { ?title <http://dbpedia.org/ontology/activeYearsStartYear> ?date . } 
    OPTIONAL { ?song <http://dbpedia.org/ontology/artist> ?title . } 
    OPTIONAL { ?songOther <http://dbpedia.org/property/artist> ?title . } 
    OPTIONAL { ?songOtherOther <http://dbpedia.org/ontology/musicalArtist> ?title . } 
} ORDER BY ?title ?date LIMIT 1 OFFSET 0 

我得到一个结果,但如果我改变OFFSET1结果集是空的? (肯定有不止一个结果可用)

任何想法?

+1

问题出现在“属性/艺术家”和“本体/音乐艺术家”谓词中,没有它们就可以正常工作。我不知道,但对我来说,它看起来像一个Virtuoso错误。 – laughedelic

+0

谢谢@laughedelic,是的,我明白,但没有那些我不会得到歌曲的信息,这对我的项目很重要。我可以通过编程的方式来完成,比如先得到每个乐队,然后为每个乐队获取歌曲,只是想知道是否有更紧凑的方式。 –

+0

您是否需要分别单曲“song”,“songOther”和“songOtherOther”? – laughedelic

回答

-1

group_concat是一个聚合器。

整个结果被分组并转换为一个结果行。因此OFFSET 0返回一行,OFFSET 1返回无行。

您的意思是使用

GROUP BY ?title 

实施例:

SELECT (count(*) AS ?C) 
WHERE 
{ ?s ?p ?o } 

一行,计数。

+0

你能澄清一下吗?为什么其他查询返回更多结果?另外,如果我删除限制条款,我会得到'Virtuoso 22026错误SR319:当尝试将3145个字符的字符串存储到temp列时,超出了最大行长度' –

+0

我认为这里的Virtuoso允许省略显式分组。我也试着明确地添加它,但它没有改变任何东西。 – laughedelic

0

所以我找到了解决办法是使用拆分在多个查询的查询的Python:

  1. 创建一个JSON指定/定义什么是需要:

    [{ 
    "root": [ 
        "?title <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:American_hard_rock_musical_groups> .", 
        "?title <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> ?name " 
    ], 
    "sub_page": [ 
        "?title <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> <[[X]]> .", 
        "OPTIONAL { ?title <http://dbpedia.org/ontology/bandMember> ?member . }", 
        "OPTIONAL { ?title <http://dbpedia.org/ontology/formerBandMember> ?oldMember . }", 
        "OPTIONAL { ?title <http://dbpedia.org/property/label> ?label . }", 
        "OPTIONAL { ?title <http://dbpedia.org/property/genre> ?genre . }", 
        "OPTIONAL { ?title <http://dbpedia.org/property/origin> ?origin . }", 
        "OPTIONAL { ?title <http://dbpedia.org/ontology/activeYearsStartYear> ?date . }", 
        "OPTIONAL { ?song <http://dbpedia.org/ontology/artist> ?title . }", 
        "OPTIONAL { ?songOther <http://dbpedia.org/property/artist> ?title . }", 
        "OPTIONAL { ?songOtherOther <http://dbpedia.org/ontology/musicalArtist> ?title . }", 
        "OPTIONAL { ?songOtherOtherOther <http://dbpedia.org/property/producer> ?title}" 
    
    ], 
    "service":"<http://dbpedia.org/sparql/>", 
    "select":[ 
          "title", 
          "date", 
          "label_s", 
          "genre_s", 
          "member_s", 
          "oldMember_s", 
          "origin_s", 
          "song_s", 
          "songOther_s", 
          "songOtherOther_s", 
          "songOtherOtherOther_s", 
          "name_X" 
        ], 
    "language": "en", 
    "limit": 10000, 
    "offset": 100, 
    "category": "music", 
    "description": "American Hard Rock", 
    "sub_category": "American_hard_rock_musical_groups" 
    }] 
    
  2. 然后用这条巨蟒脚本消耗JSON:

    import os, sys 
    from api.DBPedia import DBPedia 
    import datetime 
    import time 
    import copy 
    import json 
    class ProcessStuff(DBPedia): 
        def __init__(self, fn = ""): 
         """Initialize ProcessStuff class""" 
         self.filePath = fn 
    
        def getConfigFile(self): 
         """Reads json file with dbpedia query information""" 
         try: 
          jsonFile = open(self.filePath, "r") 
          data = json.load(jsonFile) 
          jsonFile.close() 
          return data 
         except Exception as e: 
          print "[getConfigFile] Error in reading file: %s" % e 
    
        def queryMultiplier(self, data, identifier = '[[X]]'): 
         """Generate new DBPedia queries based on previous query result.""" 
         queries = [] 
         q = self.createSparqlQuery(data) 
         json_page = self.resolveDBPediaQuery(q = q) 
         if len(data['sub_page']) > 0: 
          try: 
           items = json_page['results']['bindings'] 
           for item in items: 
            sub_data = copy.deepcopy(data) 
            # Only allows for one identifier 
            sub_page_identifier = [var for var in data['select'] if var.endswith('_X')][0].replace('_X','') 
            name = item[sub_page_identifier]['value'] 
            count = 0 
            while count < len(sub_data['sub_page']): 
             if identifier in sub_data['sub_page'][count]: 
              sub_data['sub_page'][count] = sub_data['sub_page'][count].replace(identifier,name) 
             count += 1 
            q = self.createSparqlQuery(sub_data, key = 'sub_page') 
            queries.append(q) 
          except Exception as e: 
           print "[ProcessStuff][queryMultiplier] Error in creating queries for subpages: %s" % e 
           pass 
          for query in queries: 
           file_name = data['category']+"___"+data['sub_category'] 
           print "Fetching query: \n%s" % query 
           json_page = self.resolveDBPediaQuery(q = query) 
           print "Processin page and saving to: "+ file_name 
           self.processPage(json_page, json_file = '../../json_samples/', category = file_name, overwrite = False) 
    
        def createConcat(self, data, separator = ";;;"): 
         """ Creates concat string. """ 
         return "(group_concat(distinct ?"+data+";separator='"+separator+"') as ?"+data+"_s)" 
    
        def createSparqlQuery(self, data, separator = ";;;", key = "root", offset = 100): 
         """Generates SPARQL query from input file.""" 
         query = [] 
         orderby = [] 
         select = "SELECT DISTINCT" 
         #from_each_subpage 
         for prop in data['select']: 
          if prop.endswith("_s"): 
           select +=" "+ self.createConcat(prop.split("_s")[0]) 
          else: 
           v = "?"+ prop.replace('_X','') 
           select += " "+ v 
           orderby.append(v) 
         where = " WHERE { " 
         closing = 1 
         query.append(select) 
         query.append(where) 
         try: 
          service = "SERVICE "+data['service'] + " {" 
          query.append(service) 
          closing += 1 
         except: 
          pass 
         query.append('\n'.join(data[key])) 
         while closing > 0: 
          query.append('}') 
          closing -= 1 
         o = " ORDER BY " + ' '.join(orderby) 
         query.append(o) 
         try: 
          limit = data['limit'] 
          l = " LIMIT %s" % limit 
          query.append(l) 
         except: 
          pass 
    
         complete_query = '\n'.join(query) 
         print complete_query 
         return complete_query 
    
    if __name__ == "__main__": 
        try: 
         JSON_FILE_NAME = sys.argv[1] 
        except: 
         print "JSON file name is needed to run!" 
         sys.exit(2) 
        start_time = datetime.datetime.now().time().strftime('%H:%M:%S') 
        hm = ProcessStuff(JSON_FILE_NAME) 
        data = hm.getConfigFile() 
        hm.queryMultiplier(data[0]) 
        end_time = datetime.datetime.now().time().strftime('%H:%M:%S') 
        total_time=(datetime.datetime.strptime(end_time,'%H:%M:%S') - datetime.datetime.strptime(start_time,'%H:%M:%S')) 
        print "Took %s to process %s " % (total_time, JSON_FILE_NAME) 
    
  3. 然后运行代码如下python ProcessStuff.py input.json

  4. 接过零时31分10秒处理music.json,441项加工

可以肯定的代码可以变得更快... 这ProcessStuff从继承是DBpedia中的类简单地做出HTTP请求,稍微清理结果并将结果保存为JSON。

+0

很好,你解决了你的问题。但我认为这不能解答你原来的问题(为什么查询不能按预期工作),而不是它的SPARQL解决方案。 – laughedelic

+0

@laughedelic够公平:) –