2016-09-14 65 views
0

我想弄清楚如何在文本文件中找到最频繁的单词并更改单个单词以便它包裹某些东西否则,例如:freewordchoice(自由+频繁词+选择)和文本中的任何地方,该词是该词可以改变的文本。我一直在寻找像这样疯狂的东西,但我找不到它。我很新的JavaScript,这是我想用这个。要上传和显示文本正常工作,我不明白的是,我如何定位最常用的单词,并在实际显示在浏览器中之前在整个文本中对其进行更改。在我看来,我需要某种变量来找到这个词,并在某处存储这个世界,并且需要一个变量来放置要添加或改变目标词的变量。如何在.txt文件中查找/定位最频繁的单词并将其更改为

示例文本:阿拉丁的古登堡计划ETEXT和美妙的灯

信息/问题UPT:下面的代码查找在上面的​​示例文本的全文中最常说的一句话。我现在说这个词是阿拉丁。问题是我可以用它来正确替换Aladdin这个词。我打印出fooAladdinbar,就像我想要的,但不是只改变Aladding = fooAladdinbar,而是在示例文本中的每个字母之间都有fooAladdinbar。

这是解决的,是一个可变的问题。

+0

你有一些测试数据吗?文本文件将如何显示?什么编码?哪些语言特殊字符等? –

+0

我更新了我的答案,以涵盖您的问题的替换部分。 –

+0

这是我正在从事的演示的地方:http://internetstall.nu/demo/demo.html,我只是上传一个简单的包含文本的.txt文件。不知道你是什么意思的特殊字符,我试图用JavaScript来完成,但有些事情告诉我,这不是你的意思。已经解决了问题 – user3481279

回答

0

这不是完美的,但作品,这里是一个演示:

(此演示只是发现常用字词)

  • 它分裂与正则表达式的文本
  • 然后计数单词
  • 然后返回最频繁的单词

var data = document.getElementById("data").value; 
 

 
var allWords = data.split(/\b/); 
 
var wordCountList = {}; 
 

 
allWords.forEach(function(word){ 
 
    if(word !== " "){ 
 
    if(!wordCountList.hasOwnProperty(word)){ 
 
     wordCountList[word] = {word: word, count:0}; 
 
    } 
 
    wordCountList[word].count++; 
 
    } 
 
}) 
 

 

 
var maxCountWord = {count:0}; 
 
for(var propName in wordCountList){ 
 
    var currentWord = wordCountList[propName]; 
 
    if(maxCountWord.count<currentWord.count){ 
 
    maxCountWord = currentWord; 
 
    } 
 
} 
 
console.info(maxCountWord);
textarea{ 
 
    width:100%; 
 
    height:100px; 
 
}
<textarea id="data" > 
 
<!-- start slipsum code --> 
 

 
The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee. 
 

 
<!-- end slipsum code --> 
 
</textarea> 
 

 
<div id="result"></div>

要更换您还可以使用正则表达式的话:
(这只是演示代替了常用字词)

function freewordchoice (free, word, choice){ 
 
    var data = document.getElementById("data").innerHTML; 
 
    var replaceExpression = new RegExp("\\b"+word+"\\b","gi"); 
 
    console.info(replaceExpression); 
 
    data =data.replace(replaceExpression, free + word + choice); 
 
    document.getElementById("result").innerHTML = data; 
 
    
 
} 
 

 

 
freewordchoice("<b>", "the", "</b>");
<b>Before:</b> 
 
<div id="data" > 
 
<!-- start slipsum code --> 
 

 
The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee. 
 

 
<!-- end slipsum code --> 
 
</div> 
 
<br/><br/> 
 
<b>After:</b> 
 
<div id="result" > 
 
    
 
    </div>

更新:

问题是此行

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as, 
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she, 
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your, 
would,where,whom,s,on,from,for,by,but,all,said,my,'; 

的问题是在串,said,my,';的最后删除最后一个逗号,它应该工作,像这样:

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as, 
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she, 
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your, 
would,where,whom,s,on,from,for,by,but,all,said,my'; 

由于通过最后一个逗号,最后一个字是空字符串。

+0

当我尝试运行它时,我只得到错误,我看到有一个字符串在数据部分,但我怎么会让它工作与.txt文件上传浏览器就像我在这里做的演示:http://internetstall.nu/demo/demo.html – user3481279

+0

这个脚本文件没有加载,请检查浏览器的控制台(按F12)。 –

+0

@ user3481279您是否看到我最近的评论?该脚本文件未被加载。我的解决方案有用吗?或者你需要更多的帮助? –

相关问题