2016-05-31 95 views
0

我是OrientDB的新手,对Neo4J有一点经验,在使用OETL.BAT工具加载和创建边缘时遇到性能问题。我需要在节点之间创建约440万条边(约42百万个,并不是所有的都在这个阶段使用)。 “客户”节点已经加载,我加载的边界列表非常简单(如下所示),并且每个边缘的源目标ID仅为&,其目的是模拟客户之间的付款。根据etl工具,目前我的吞吐量为每秒23-30次。我使用了CSV文件,而不是JDBC连接到我的RDBMS,并且我也处于“plocal”模式。OrientDB慢边ETL创建

有没有更快的方法来做到这一点?或者我可能采取了错误的做法?

客户 - 顶点 CISNumber,名称

支付 - 边缘提前 SourceCISNumber,DestCISNumber,金额,TransactionCount

感谢

{ 
"source": { "file": { "path": "/datafiles/PersonalCustomers/Edges.csv" } }, 
    "extractor": { "row": {} }, 
"transformers": [ 
    {"csv": {} }, 
    {"merge": {"joinFieldName": "SourceCISNumber", "lookup": "Customer.CISNumber"} }, 
    {"vertex": {"class": "Customer", "skipDuplicates": true} }, 
    { "edge": 
     { 
      "class": "PAID", 
      "joinFieldName": "DestCISNumber", 
      "lookup": "Customer.CISNumber", 
      "unresolvedLinkAction": "SKIP", 
      "edgeFields": 
       { 
        "Volume": "${input.Transactioncount}", 
        "Value": "${input.Amount}" 
       } 
     } 
    }, 
    {"field": {"fieldNames": ["SourceCISNumber", "DestCISNumber", "Transactioncount", "Amount"], "operation": "remove" } } 
    ], 
    "loader": { 
    "orientdb": { 
     "dbURL": "plocal:/orientdb/databases/Customers", 
     "dbType": "graph", 
     "batchCommit": 500, 
     "useLightweightEdges" : true, 
     "classes": [ 
     {"name": "PAID", "extends": "E"}, 
     ] 
    }, 
    "indexes": [ 
     {"class":"Customer", "fields":["CISNumber:long"] } 
     ] 
    } 
} 
+0

你可以看到这个[question](http://stackoverflow.com/questions/37053190/orientdb-fastest-batchimport/37065876#37065876) –

回答

0

你应该把 “batchCommit” :1000在“装载机”。 也“并行”:在“配置”中为true