2012-08-16 111 views
0
<?php 

$wordFrequencyArray = array(); 

function countWordsfrequency($filename) { 
global $wordFrequencyArray; 

$contentoffile = (file_get_contents($filename)); 

$wordArray = preg_split('/[^a-zA-Z0-9]/', $contentoffile, -1, NO_EMPTY); 


foreach (array_count_values($wordArray) as $word => $count) { 
     if (!isset($wordFrequencyArray[$word])) $wordFrequencyArray[$word] = 0; 
     $wordFrequencyArray[$word] += $count; 
    } 
} 


$filenames = array('file1.txt', 'file2.txt','file3.txt','file4.txt'); 
foreach ($filenames as $filename) { 
    countWordsfrequency($filename); 
} 



print_r($wordFrequencyArray); 

?> 

这是我的代码查找多个文件和打印每个单词的频率them.Now我想做的事是检查发现在路口发生的是哪个字哪些文件。例如,如果有一个单词“堆栈”,我想打印出它发生在哪些文件和它的频率,我想我已经计算过了。查找路口/在多个文件中的单词的频率

最终结果应该与发生该单词的文件的频率相同。

我该如何处理?我应该在countWords函数本身的for循环中检查它。

回答

0

您将不得不保存更多信息。我会避免使用类,因为它似乎不需要太强大的任何东西。

<?php 
$wordFrequencies = array(); 

function countWordsFrequency($filename) { 
    global $wordFrequencies; 
    $contentoffile = (file_get_contents($filename)); 
    $wordArray = preg_split('/[^a-zA-Z0-9]/', $contentoffile, -1, NO_EMPTY); 

    foreach (array_count_values($wordArray) as $word => $count) { 
    $wordFreqInfo = $wordFrequencies[$word]; 
    if (!isset($wordFreqInfo)) { 
     $wordFreqInfo = array(); 
     $wordFreqInfo['total'] = 0; 
     $wordFreqInfo['files'] = array(); 
     $wordFrequencies[$word] = $wordFreqInfo; 
    } 

    // If this is the first occurence of this word in the file, start a count. 
    if (!isset($wordFreqInfo['files'][$filename])) 
     $wordFreqInfo['files'][$filename] = 0; 
    } 

    // Increment counts for both the total and the file. 
    $wordFreqInfo['total'] += $count; 
    $wordFreqInfo['files'][$filename] += $count; 
    } 
} 

$filenames = array('file1.txt', 'file2.txt','file3.txt','file4.txt'); 
foreach ($filenames as $filename) { 
    countWordsFrequency($filename); 
} 

print_r($wordFrequencies); 
?> 
相关问题