如何在C中填充trie？

我正在尝试编写一个程序，它需要单词并创建一个trie，每个节点的trie都是包含一个单独字符的结构。如何在C中填充trie？

我有一个函数将char *解析为单词（假设char *仅包含小写字母）。由于每个单词都是从char *中获取的，因此将其传递给函数addWordOccurrence(const char* word, const int wordLength, struct tNode root)。 addWordOccurrence()应该检查单词的第一个字母是否在root.branches[i]中，因为我在循环中检查每个可能的索引root.branches（对于字母表中的所有小写字母都是0-25）。如果第一个字母不在root.branches中，则会创建一个包含新字母的新结构tNode。然后继续到单词的第二个字母比较它与新建结构的分支tNode等等......

我们尝试的第一个单词是“医生”，我的特里采用第一个字母'd '并将其添加到root.branches[0]，然后将'o'添加到root.branches[0].branches[0]，这是正确的。但是，它将医生的'd'添加到其分支的下17个索引（所以root.branches[0].branches[1] through [18]），这不应该是这种情况。请帮忙！

struct tNode{ 
    char c; 
    int occurrences; 
    struct tNode *branches; 
}; 

int addWordOccurrence(const char* word, const int wordLength, struct tNode root){ 
//declare fields 
int counter, i,k,firstNull; 
counter = 0; 
while(1){ 
    if(counter >= wordLength){ 
    break; 
    } 
    //traverse through the word letter by letter 
    for(i=0; i<wordLength; i++){ 
    //compare each letter to the branches of root until the letter is found or first null space 
    for(k=0; k<26; k++){ 
    //if the letter is a branch already set root to the struct of that letter in branches 
     if(root.branches[k].c == word[i]){ 
      root = root.branches[k]; 
      break; 
     } 
    } 
    //the current letter of the word is not in branches 
    //go through branches to find position to add the new tNode 
    for(firstNull=0; firstNull<26; firstNull++){ 
     //set firstNull equal to the index of the first null value in branches 
     if(root.branches[firstNull].c < 'a' || root.branches[firstNull].c > 'z'){ 
      break; 
     } 
    } 
    //add a new node to branches 
    root.branches[firstNull].c = word[i]; 
    root.branches[firstNull].occurrences = 0; 
    root.branches[firstNull].branches = malloc(sizeof(struct tNode) * 26); 
    if(counter != wordLength){ 
     root = root.branches[firstNull]; 
    } 
    counter++; 
    if(counter == wordLength-2){ 
     root.occurrences++; 
    } 
} 
} 
return 0; 
}

来源

2014-11-01 G. Robinson

你觉得第一个'break'在做什么？我强烈的赌注是，这不是那样做的。 – Gene 2014-11-02 00:54:25

最初在while循环结尾处的root.occurrences ++是在这个while之外，所以在读完单词的最后一个字母之后，它会增加'r'（如果单词是'doctor'）tNode.occurrences最后一个字母添加了，但是当我调试它时，tNode.occurrence的值为0，当它应该是1时，所以break是退出while循环...我改变了很多次，我是疯狂地看着它，对此感到遗憾。 – 2014-11-02 01:31:50

一束与执行上的问题：

这是特里结构的一个奇怪的设计与具有字母的随机排列。不得不在每个级别上对你想要的信件进行线性搜索，这首先会破坏执行trie的目的。
当你做root = root.branches[k];你正在创建一个变量的副本。现在在这种情况下可能会碰巧为你工作，因为通过指针访问事物，但它实际上只是在寻求麻烦。
当你在循环中分配一个节点时，你不会初始化它，这意味着它充满了垃圾/未知数据并导致问题。
你的实现是不必要的复杂，就像你的外环while (1)循环。

对于一个非常简单的线索，我会做这样的：

struct tNode { 
    bool isWord; 
    struct tNode *branches[26]; 
}; 

void addWordOccurrence (const char* word, const int wordLength, struct tNode* pRoot) { 
    int i; 
    int nodeIndex; 
    tNode* pCurrentNode = pRoot; 

    for (i = 0; i < wordLength; ++i) 
    { 
     nodeIndex = tolower(word[i]) - 'a'; 

     if (nodeIndex >= 0 && nodeIndex <= 25) 
     { 
      if (pCurrentNode->branches[nodeIndex] == NULL) 
      { 
       pCurrentNode->branches[nodeIndex] = calloc(1, sizeof(tNode)); 
      } 

      pCurrentNode = pCurrentNode->branches[nodeIndex]; 
     } 
    } 

    pCurrentNode->isWord = true; 
}

你可以使用struct tNode *branches;，但它实际上只是增加了一个分配步骤，你真的不需要。您使用字符的ASCII值将'a'和'branches[25]'分配为'z'...不需要搜索真正杀死该特性的“空闲”点。最后，你需要一个终结者，如isWord，以便知道“医生”是一个词，而“docto”不是。

来源

2014-11-02 01:52:37 uesp

如何在C中填充trie？

回答

相关问题