2017-04-13 211 views
0

我需要解析一个包含金融FIX协议的文件。示例如下:所以需要考虑性能高效解析FIX消息C++

1128=99=24535=X49=CME75=2017040934=82452=2017040920070508394791460=201704092007050800000005799=10000000268=2279=0269=B48=900655=ESM783=23271=1473460731=100000005796=17263279=0269=C48=900655=ESM783=24271=2861528731=100000005796=1726310=219 

我的应用程序将加载每个许多文件与数以百万计的历史数据行。

我已经回顾了FIX解析的相关问题,并探讨了QuickFix库(特别是使用FIX :: Message(字符串)来破解消息),但是我的目标是吞吐量比我能够达到的更好实现使用quickfix。

我写了一个模拟最常见的消息类型(市场数据增量刷新),以查看我正在实现的速度的种类,并且最令我印象深刻的结果是〜60,000消息/秒,包括文件解析一个3米行文件。

这是我的第一个C++应用程序,所以我期待在我的方法中存在很多缺陷,并且如何改进其性能的任何建议将不胜感激。

目前流程是file-> string-> MDIncrementalRefresh。 MDIncrementalRefresh有两个可选的重复组,我使用一个向量来存储,因为它们从消息到消息的大小未知。

我在猜测我每次更新时重建MDIncrementalRefresh的事实都会导致不必要的开销,如果我要通过更新之前MDIncrementalRefresh的内容来重新使用该对象,

由于提前

#include <string> 
#include <vector> 
#include <iostream> 
#include <fstream> 

using namespace std; 

std::vector<std::string> string_split(std::string s, const char delimiter) 
{ 
    size_t start=0; 
    size_t end=s.find_first_of(delimiter); 

    std::vector<std::string> output; 

    while (end <= std::string::npos) 
    { 
     output.emplace_back(s.substr(start, end-start)); 

     if (end == std::string::npos) 
      break; 

     start=end+1; 
     end = s.find_first_of(delimiter, start); 
    } 

    return output; 
} 

const char FIX_FIELD_DELIMITER = '\x01'; 
const char FIX_KEY_DELIMITER = '='; 

const int STR_TO_CHAR = 0; 
const int KEY = 0; 
const int VALUE = 1; 

const string Field_TransactTime = "60"; 
const string Field_MatchEventIndicator = "5799"; 
const string Field_NoMDEntries = "268"; 
const string Field_MDUpdateAction = "279"; 
const string Field_MDEntryType = "269"; 
const string Field_SecurityID = "48"; 
const string Field_RptSeq = "83"; 
const string Field_MDEntryPx = "270"; 
const string Field_MDEntrySize = "271"; 
const string Field_NumberOfOrders = "346"; 
const string Field_MDPriceLevel = "1023"; 
const string Field_OpenCloseSettlFlag = "286"; 
const string Field_AggressorSide = "5797"; 
const string Field_TradingReferenceDate = "5796"; 
const string Field_HighLimitPrice = "1149"; 
const string Field_LowLimitPrice = "1148"; 
const string Field_MaxPriceVariation = "1143"; 
const string Field_ApplID = "1180"; 
const string Field_NoOrderIDEntries = "37705"; 
const string Field_OrderID = "37"; 
const string Field_LastQty = "32"; 
const string Field_SettlPriceType= "731"; 

class OrderIdEntry { 
public: 
    string OrderID; 
    int LastQty; 
}; 

struct MDEntry { 
public: 
    // necessary for defaults? 
    char MDUpdateAction; 
    char MDEntryType; 
    int SecurityID; 
    int RptSeq; 
    double MDEntryPx; 
    int MDEntrySize; 
    int NumberOfOrders = 0; 
    int MDPriceLevel = 0; 
    int OpenCloseSettlFlag = 0; 
    string SettlPriceType = ""; 
    int AggressorSide = 0; 
    string TradingReferenceDate = ""; 
    double HighLimitPrice = 0.0; 
    double LowLimitPrice = 0.0; 
    double MaxPriceVariation = 0.0; 
    int ApplID = 0; 

}; 

class MDIncrementalRefresh { 

public: 
    string TransactTime; 
    string MatchEventIndicator; 
    int NoMDEntries; 
    int NoOrderIDEntries = 0; 
    vector<MDEntry> MDEntries; 
    vector<OrderIdEntry> OrderIdEntries; 

    MDIncrementalRefresh(const string& message) 
    { 

     MDEntry* currentMDEntry = nullptr; 
     OrderIdEntry* currentOrderIDEntry = nullptr; 

     for (auto fields : string_split(message, FIX_FIELD_DELIMITER)) 
     { 
      vector<string> kv = string_split(fields, FIX_KEY_DELIMITER); 

      // Header :: MDIncrementalRefresh 

      if (kv[KEY] == Field_TransactTime) this->TransactTime = kv[VALUE]; 

      else if (kv[KEY] == Field_MatchEventIndicator) this->MatchEventIndicator = kv[VALUE]; 
      else if (kv[KEY] == Field_NoMDEntries) this->NoMDEntries = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_NoOrderIDEntries) this->NoOrderIDEntries = stoi(kv[VALUE]); 

      // Repeating Group :: MDEntry 

      else if (kv[KEY] == Field_MDUpdateAction) 
      { 
       MDEntries.push_back(MDEntry()); 
       currentMDEntry = &MDEntries.back(); // use pointer for fast lookup on subsequent repeating group fields 
       currentMDEntry->MDUpdateAction = kv[VALUE][STR_TO_CHAR]; 
      } 
      else if (kv[KEY] == Field_MDEntryType) currentMDEntry->MDEntryType = kv[VALUE][STR_TO_CHAR]; 
      else if (kv[KEY] == Field_SecurityID) currentMDEntry->SecurityID = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_RptSeq) currentMDEntry->RptSeq = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_MDEntryPx) currentMDEntry->MDEntryPx = stod(kv[VALUE]); 
      else if (kv[KEY] == Field_MDEntrySize) currentMDEntry->MDEntrySize = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_NumberOfOrders) currentMDEntry->NumberOfOrders = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_MDPriceLevel) currentMDEntry->MDPriceLevel = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_OpenCloseSettlFlag) currentMDEntry->OpenCloseSettlFlag = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_SettlPriceType) currentMDEntry->SettlPriceType= kv[VALUE]; 
      else if (kv[KEY] == Field_AggressorSide) currentMDEntry->AggressorSide = stoi(kv[VALUE]); 
      else if (kv[KEY] == Field_TradingReferenceDate) currentMDEntry->TradingReferenceDate = kv[VALUE]; 
      else if (kv[KEY] == Field_HighLimitPrice) currentMDEntry->HighLimitPrice = stod(kv[VALUE]); 
      else if (kv[KEY] == Field_LowLimitPrice) currentMDEntry->LowLimitPrice = stod(kv[VALUE]); 
      else if (kv[KEY] == Field_MaxPriceVariation) currentMDEntry->MaxPriceVariation = stod(kv[VALUE]); 
      else if (kv[KEY] == Field_ApplID) currentMDEntry->ApplID = stoi(kv[VALUE]); 

      // Repeating Group :: OrderIDEntry 
      else if (kv[KEY] == Field_OrderID) { 
       OrderIdEntries.push_back(OrderIdEntry()); 
       currentOrderIDEntry = &OrderIdEntries.back(); 
       currentOrderIDEntry->OrderID = kv[VALUE]; 
      } 

      else if (kv[KEY] == Field_LastQty) currentOrderIDEntry->LastQty = stol(kv[VALUE]); 
     } 
    } 


}; 

int main() { 

    //std::string filename = "test/sample"; 

    std::string line; 
    std::ifstream file (filename); 

    int count = 0; 
    if (file.is_open()) 
    { 
     while (std::getline(file, line)) 
     { 
      MDIncrementalRefresh md(line); 
      if (md.TransactTime != "") { 
       count++; 
      } 
     } 
     file.close(); 
    } 
    cout << count << endl; 
    return 0; 
} 
+0

'这是我的第一个C++应用程序'而且你从一开始就坚持吞吐量。获得一份能够完成工作而不是效率问题的代码。如果没有配置器,你会在优化中犯错。 – DumbCoder

+0

@DumbCoder我感谢您花时间审查我的问题。虽然我提到这是我的第一个C++应用程序,我没有说这是我第一次编写软件。因此,我完全有能力获得解决方案,但希望得到一些关于如何最好地分析和了解潜在瓶颈的有用指导(例如重复调用split_string可能会隐含扩展堆分配的事实)。 – awaugh

回答

0

对于那些谁感兴趣的话,大部分的时间花在处理上面的代码是在split_string功能。对split_string的大量调用导致在堆上完成许多(昂贵的)分配。

另一种实现split_string_optim重新使用预先分配的向量。这可以防止在每次split_string函数调用时不必要的堆分配/扩展。下面的示例运行1.5m迭代表明速度提高了3.4倍。通过使用vector.clear()本身不会将分配的内存释放回堆中,它确保后续split_string调用split_string_optim,其中结果向量大小为< = previous没有额外的分配。

#include <string> 
#include <vector> 

void string_split_optim(std::vector<std::string>& output, const std::string &s, const char delimiter) 
{ 
    output.clear(); 

    size_t start = 0; 
    size_t end = s.find_first_of(delimiter); 


    while (end <= std::string::npos) 
    { 
     output.emplace_back(s.substr(start, end - start)); 

     if (end == std::string::npos) 
      break; 

     start = end + 1; 
     end = s.find_first_of(delimiter, start); 
    } 

} 


int main() 
{ 
    const int NUM_RUNS = 1500000; 
    const std::string s = "1128=9\u00019=174\u000135=X\u000149=CME\u000175=20170403\u000134=1061\u000152=20170402211926965794928\u000160=20170402211926965423233\u00015799=10000100\u0001268=1\u0001279=1\u0001269=1\u000148=9006\u000155=ESM7\u000183=118\u0001270=236025.0\u0001271=95\u0001346=6\u00011023=9\u000110=088\u0001"; 

    std::vector<std::string> vec; 

    // standard 
    clock_t tStart = clock(); 
    for (int i = 0; i < NUM_RUNS; ++i) 
    { 
     vec = string_split(s, '='); 
    } 

    printf("Time taken: %.2fs\n", (double) (clock() - tStart)/CLOCKS_PER_SEC); 

    // reused vector 
    tStart = clock(); 
    for (int i = 0; i < NUM_RUNS; ++i) 
    { 
     string_split_optim(vec, s, '='); 
     vec.clear(); 
    } 

    printf("Time taken: %.2fs\n", (double) (clock() - tStart)/CLOCKS_PER_SEC); 
} 

我的macbook上的结果是3.4倍的改进。

Time taken: 6.60s 
Time taken: 1.94s 

另外,MDIncrementalRefresh对象正在重复构造(在栈上,但它的向量成员也被在堆中展开)。根据以上关于split_string的发现,我决定重新使用临时对象,并简单地清除其之前的状态,从而导致另一个显着的性能提升。