2014-10-31 64 views
0

我试图让一些机器学习问题的尽可能多的功能尽可能多的功能列表。DBPedia本地服务器为不同的查询提供奇怪的结果

我已经设置了一个本地DBPedia服务器,并且已经增加了各种参数的限制,但不知何故,我仍然无法获得所需的结果。

所需的输出为fo以下格式的CSV:

<Person1>,<Feature1>,<Feature2>,<Feature3> .......... and so on 
<Person2>,<Feature1>,<Feature2>,<Feature3> .......... and so on 
<Person3>,<Feature1>,<Feature2>,<Feature3> .......... and so on 
...and 
...so 
...on 

有人可以告诉我朝着正确的方法呢?

对于实施例,当运行此查询,我得到的blank结果:

QUERY:

SELECT ?name ?birthDate WHERE { 
    { 
     SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(? 
    birthDate) as ?birthDate WHERE { 
     ?person a <http://dbpedia.org/ontology/Person> . 
     ?person dbpedia-owl:birthDate ?birthDate . 

} 
     ORDER BY ASC(?name) 
    } 
} 

OFFSET 100000 
LIMIT 500 

结果: [[名称]] [[生日]]

但是当我运行这个查询时,我得到的行数只有50000,这是很少的

QUERY:

SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?birthDate) 
    as ?birthDate, str(?birthName) as ?birthName, strafter(str(? 
    occupation),"http://dbpedia.org/resource/") as ?occupation WHERE { 
     ?person a <http://dbpedia.org/ontology/Person> . 
     ?person dbpedia-owl:birthDate ?birthDate . 
     ?person dbpedia-owl:birthName ?birthName . 
     ?person dbpedia-owl:occupation ?occupation . 

    } 

结果: < < 50000行>>

奇怪的是,该查询似乎工作(ATLEAST高达好一些) -

QUERY:

select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o } 

结果: < < 1051038行>>

我virtuoso.ini文件:

[Database] 
DatabaseFile     = /var/lib/virtuoso/db/virtuoso.db 
ErrorLogFile     = /var/lib/virtuoso/db/virtuoso.log 
LockFile      = /var/lib/virtuoso/db/virtuoso.lck 
TransactionFile     = /var/lib/virtuoso/db/virtuoso.trx 
xa_persistent_file    = /var/lib/virtuoso/db/virtuoso.pxa 
ErrorLogLevel     = 7 
FileExtend      = 200 
;MaxCheckpointRemap    = 2000 
MaxCheckpointRemap    = 1362500 
Striping      = 0 
TempStorage      = TempDatabase 


[TempDatabase] 
DatabaseFile     = /var/lib/virtuoso/db/virtuoso-temp.db 
TransactionFile     = /var/lib/virtuoso/db/virtuoso-temp.trx 
MaxCheckpointRemap    = 2000 
Striping      = 0 

[Parameters] 
ServerPort      = 1111 
LiteMode      = 0 
DisableUnixSocket    = 1 
DisableTcpSocket    = 0 
;SSLServerPort     = 2111 
;SSLCertificate     = cert.pem 
;SSLPrivateKey     = pk.pem 
;X509ClientVerify    = 0 
;X509ClientVerifyDepth   = 0 
;X509ClientVerifyCAFile   = ca.pem 
ServerThreads     = 20 
CheckpointInterval    = 60 
O_DIRECT      = 0 
CaseMode      = 2 
MaxStaticCursorRows    = 500000000 
CheckpointAuditTrail   = 0 
AllowOSCalls     = 0 
SchedulerInterval    = 10 
DirsAllowed      = ., /usr/share/virtuoso/vad, /usr/local/data/datasets 
ThreadCleanupInterval   = 0 
ThreadThreshold     = 10 
ResourcesCleanupInterval  = 0 
FreeTextBatchSize    = 100000 
SingleCPU      = 0 
VADInstallDir     = /usr/share/virtuoso/vad/ 
PrefixResultNames    = 0 
RdfFreeTextRulesSize   = 100 
IndexTreeMaps     = 256 
MaxMemPoolSize     = 200000000 
PrefixResultNames    = 0 
MacSpotlight     = 0 
IndexTreeMaps     = 64 
MaxSortedTopRows    = 100000000 
;; 


;; Uncomment next two lines if there is 64 GB system memory free 
NumberOfBuffers   = 5450000 
MaxDirtyBuffers   = 4000000 
;; 

[HTTPServer] 
ServerPort      = 8890 
ServerRoot      = /var/lib/virtuoso/vsp 
ServerThreads     = 20 
DavRoot       = DAV 
EnabledDavVSP     = 0 
HTTPProxyEnabled    = 0 
TempASPXDir      = 0 
DefaultMailServer    = localhost:25 
ServerThreads     = 10 
MaxKeepAlives     = 10 
KeepAliveTimeout    = 10 
MaxCachedProxyConnections  = 10 
ProxyConnectionCacheTimeout  = 15 
HTTPThreadSize     = 280000 
HttpPrintWarningsInOutput  = 0 
Charset       = UTF-8 
;HTTPLogFile     = logs/http.log 

[AutoRepair] 
BadParentLinks     = 0 


[Client] 
SQL_PREFETCH_ROWS    = 100 
SQL_PREFETCH_BYTES    = 16000 
SQL_QUERY_TIMEOUT    = 0 
SQL_TXN_TIMEOUT     = 0 
;SQL_NO_CHAR_C_ESCAPE   = 1 
;SQL_UTF8_EXECS     = 0 
;SQL_NO_SYSTEM_TABLES   = 0 
;SQL_BINARY_TIMESTAMP   = 1 
;SQL_ENCRYPTION_ON_PASSWORD  = -1 

[VDB] 
ArrayOptimization    = 0 
NumArrayParameters    = 10 
VDBDisconnectTimeout   = 1000 
KeepConnectionOnFixedThread  = 0 

[Replication] 
ServerName      = db-IP-172-31-24-242 
ServerEnable     = 1 
QueueMax      = 5000000 


[Striping] 
Segment1      = 100M, db-seg1-1.db, db-seg1-2.db 
Segment2      = 100M, db-seg2-1.db 
;... 


[Zero Config] 
ServerName      = virtuoso (IP-172-31-24-242) 

[URIQA] 
DynamicLocal     = 0 
DefaultHost      = localhost:8890 


[SPARQL] 
;ExternalQuerySource   = 1 
;ExternalXsltSource    = 1 
;DefaultGraph     = http://localhost:8890/dataspace 
;ImmutableGraphs    = http://localhost:8890/dataspace 
;ResultSetMaxRows    = 10000 
ResultSetMaxRows    = 1000000000 
;MaxQueryCostEstimationTime  = 400 ; in seconds 
MaxQueryCostEstimationTime  = 4000000000000000  ; in seconds 
;MaxQueryExecutionTime   = 60 ; in seconds 
MaxQueryExecutionTime   = 600000000000000  ; in seconds 
DefaultQuery     = select distinct ?Concept where {[] a ?Concept} LIMIT 
100 
DeferInferenceRulesInit   = 0 ; controls inference rules loading 
;PingService     = http://rpc.pingthesemanticweb.com/ 
MaxSortedTopRows    = 10000000 

[Plugins] 
LoadPath      = /usr/lib/virtuoso/hosting 
Load1       = plain, wikiv 
Load2       = plain, mediawiki 
Load3       = plain, creolewiki 
Load4     = plain, im 

请告诉我,如果我错过了这些东西微不足道,但结果查询对我来说没有意义。

回答

0

由于您正在进行大量完全不同的查询,因此很难确定您确切的问题。如果你想孤立原因,最好的办法是做一些小的改变。

此外:您的所有查询都是语法上非法的SPARQL,这使得很难判断发生了什么问题。特别是你制定'AS'别名的方式是不正确的 - 一方面它们应该被括在圆括号中,其次你不应该别名到已经存在的变量。例如,而不是这样的:

str(?birthDate) as ?birthDate 

,你应该做的是这样的:

(str(?birthDate) as ?bd) 

除此之外,在你第一次查询时,您正在设置偏移的100000值想必,您只是因为结果少于100000而没有得到任何答案。

在你的第二个查询中,你得到了50000个结果,这大概准确地反映了符合你的标准的实际人数。同样,查询有点奇怪,您尝试使用“AS”别名命令将“重新绑定”变量重新绑定到新值。

最后,最后一个查询只是检索关于Person类型资源的所有三元组。这并不令人感到意外,因为你没有进一步的约束,这个结果要大得多。结果中的每一行都是特定人员的一个属性值组合。

我建议你看看基本的SPARQL教程,因为我认为你可能会缺少一些基础知识。 SPARQL需要一些习惯,但是一旦你学习了基础知识(比如图模式匹配的实际意义),你会发现编写自己的查询要容易得多。