2015-02-11 303 views
1

我正在努力使用Nodejs将HTML文件编入索引。然而,即使在使用Nodejs之前,我试图运行下面的手动索引,这似乎不工作。我错过了什么?在索引到elasticsearch之前去除HTML标签

指数样本HTML标签使用html_strip过滤器:

curl -XPOST 'localhost:9200/bhs/articles/_analyzer?tokenizer=standard&char_filters=html_strip' -d ' 
{ 
    "content" : "<title>Dilip Kumar</title>" 
}' 

搜索得到的所有文件:

http://localhost:9200/bhs/articles/_search 

它提供了以下的结果:

{ 
    "took": 4, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 1, 
    "hits": [ 
     { 
     "_index": "bhs", 
     "_type": "articles", 
     "_id": "AUt2TGl9aadd5iLJ3mue", 
     "_score": 1, 
     "_source": { 
      "content": "<title>Dilip Kumar</title>" 
     } 
     } 
    ] 
    } 
} 

理想情况下,不应该指数标签,因为我已经使用html_filter去除标签。

+0

我期待在上下文elasticsearch。不是JavaScript。 – joy 2015-02-11 02:04:44

+0

我看到标签也被索引,因此当我搜索“标题”时,它就是结果。似乎我缺少基础知识。 – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-11 02:23:13</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">什么是您的文章类型的映射 - 你告诉它使用自定义分析器? – <span class="text-secondary"> <small> <span>2015-02-11 17:31:05</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> </div> <div class="answer-title"> <span class="text-logo margin-top-sm">A</span> <h2 class="title h4">回答</h2> </div> <div class="item-description text-md markdown-body margin-bottom-40 voidso"> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">0<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>您在返回的搜索结果中看到的是存储的内容,即,这不是已经编制索引的单个条款。</p> ​​ <p>要查看已被索引是一个更具有挑战性 - 索引条款没有被设计要返回给用户,而仅使用时查找。</p> <p>但是,您可以访问和使用脚本来查看它们:</p> <pre><code class="prettyprint-override">curl 'http://localhost:9200/bhs/articles/_search?pretty=true' -d '{ "query" : { "match_all" : { } }, "script_fields": { "terms" : { "script": "doc[field].values", "params": { "field": "content" } } } }' </code></pre> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/28460670">来源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2015-02-11 17:29:49</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> <!-- comments --> <div class="comments"> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">感谢您解释_source。我不想索引标签,即<title>。目前,我可以使用“标题”字搜索,而我不想将“标题”作为<title>的一部分。我应该如何索引没有HTML标签的内容? – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-12 06:09:02</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">什么是您的文章类型的映射 - 你告诉它使用自定义分析器? – <span class="text-secondary"> <small> <span>2015-02-12 08:13:13</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">由于我错误地创建了两个帖子,因为我没有意识到两者都涉及到相同的问题....你能检查下面的帖子来映射使用http://stackoverflow.com/questions/28445684/why-html-tag-被搜索的偶数如果-IT-被过滤的功能于弹性搜索/ 28446814?noredirect = 1个#comment45231786_28446814 – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-12 16:44:50</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4319274062" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相关问题</div> <ul class="relative_list"> <li> 1. <a href="http://cn.voidcc.com/question/p-xjjkgvkm-uh.html" target="_blank" title="ElasticSearch防止搜索html标签"> ElasticSearch防止搜索html标签 </a> </li> <li> 2. <a href="http://cn.voidcc.com/question/p-wdysfith-h.html" target="_blank" title="用Jquery去除数字前置标签"> 用Jquery去除数字前置标签 </a> </li> <li> 3. <a href="http://cn.voidcc.com/question/p-abddzjdl-eo.html" target="_blank" title="选择性地去除HTML标签"> 选择性地去除HTML标签 </a> </li> <li> 4. <a href="http://cn.voidcc.com/question/p-kvjzxsry-a.html" target="_blank" title="如何使用PHPQuery去除HTML标签?"> 如何使用PHPQuery去除HTML标签? </a> </li> <li> 5. <a href="http://cn.voidcc.com/question/p-qhgrejfw-qu.html" target="_blank" title="回去一个标签索引"> 回去一个标签索引 </a> </li> <li> 6. <a href="http://cn.voidcc.com/question/p-yhdbrepi-dx.html" target="_blank" title="BeautifulSoup标签去除"> BeautifulSoup标签去除 </a> </li> <li> 7. <a href="http://cn.voidcc.com/question/p-kovsqfqg-qo.html" target="_blank" title="在document.ready()之前获取HTML标签,DOM呈现之前"> 在document.ready()之前获取HTML标签,DOM呈现之前 </a> </li> <li> 8. <a href="http://cn.voidcc.com/question/p-gterdmis-un.html" target="_blank" title="如何清除ElasticSearch索引?"> 如何清除ElasticSearch索引? </a> </li> <li> 9. <a href="http://cn.voidcc.com/question/p-xotediez-mn.html" target="_blank" title="Elasticsearch禁用删除索引"> Elasticsearch禁用删除索引 </a> </li> <li> 10. <a href="http://cn.voidcc.com/question/p-mboezzwj-dh.html" target="_blank" title="离线删除Elasticsearch索引"> 离线删除Elasticsearch索引 </a> </li> <li> 11. <a href="http://cn.voidcc.com/question/p-hwaexhet-ve.html" target="_blank" title="删除Elasticsearch索引设置"> 删除Elasticsearch索引设置 </a> </li> <li> 12. <a href="http://cn.voidcc.com/question/p-pouimjtv-mn.html" target="_blank" title="如何在使用php导出到csv之前删除html标签?"> 如何在使用php导出到csv之前删除html标签? </a> </li> <li> 13. <a href="http://cn.voidcc.com/question/p-ebibebfm-mp.html" target="_blank" title="Android - 在活动标签之间切换,获取标签索引"> Android - 在活动标签之间切换,获取标签索引 </a> </li> <li> 14. <a href="http://cn.voidcc.com/question/p-mzpymsgc-vy.html" target="_blank" title="去除除锚定标记之外的所有HTML标记"> 去除除锚定标记之外的所有HTML标记 </a> </li> <li> 15. <a href="http://cn.voidcc.com/question/p-aipominq-du.html" target="_blank" title="wp_update_comment()失去html标签"> wp_update_comment()失去html标签 </a> </li> <li> 16. <a href="http://cn.voidcc.com/question/p-uwwbbwfo-vh.html" target="_blank" title="Solr索引文件删除html标签和垃圾内容形式索引"> Solr索引文件删除html标签和垃圾内容形式索引 </a> </li> <li> 17. <a href="http://cn.voidcc.com/question/p-dpsxjogs-nr.html" target="_blank" title="索引CJK和剥离HTML标签"> 索引CJK和剥离HTML标签 </a> </li> <li> 18. <a href="http://cn.voidcc.com/question/p-opsjtmmz-mz.html" target="_blank" title="卸下簇索引去除"> 卸下簇索引去除 </a> </li> <li> 19. <a href="http://cn.voidcc.com/question/p-ytnpbvvz-us.html" target="_blank" title="去除XML标签空白"> 去除XML标签空白 </a> </li> <li> 20. <a href="http://cn.voidcc.com/question/p-dppdnstg-tu.html" target="_blank" title="如何去除标签?"> 如何去除标签? </a> </li> <li> 21. <a href="http://cn.voidcc.com/question/p-naafpqxg-gc.html" target="_blank" title="HtmlEditorExtender除去预标签"> HtmlEditorExtender除去预标签 </a> </li> <li> 22. <a href="http://cn.voidcc.com/question/p-elsuejfp-sn.html" target="_blank" title="Solr:索引之前的标点符号"> Solr:索引之前的标点符号 </a> </li> <li> 23. <a href="http://cn.voidcc.com/question/p-fooscele-rc.html" target="_blank" title="新AMP HTML标签和索引的搜索引擎"> 新AMP HTML标签和索引的搜索引擎 </a> </li> <li> 24. <a href="http://cn.voidcc.com/question/p-katjhkhi-d.html" target="_blank" title="如何在PHP中使用黑名单去除HTML标签?"> 如何在PHP中使用黑名单去除HTML标签? </a> </li> <li> 25. <a href="http://cn.voidcc.com/question/p-rydxvmkj-rv.html" target="_blank" title="如何在VB6中使用MSHTML Parser去除所有HTML标签?"> 如何在VB6中使用MSHTML Parser去除所有HTML标签? </a> </li> <li> 26. <a href="http://cn.voidcc.com/question/p-mxdcujkb-cs.html" target="_blank" title="从图像标签的标题和alt属性中去除HTML标签"> 从图像标签的标题和alt属性中去除HTML标签 </a> </li> <li> 27. <a href="http://cn.voidcc.com/question/p-xsuexoyt-dd.html" target="_blank" title="在SELECT WHERE之前添加表索引并在INSERT之前删除它们"> 在SELECT WHERE之前添加表索引并在INSERT之前删除它们 </a> </li> <li> 28. <a href="http://cn.voidcc.com/question/p-exhqnlea-bw.html" target="_blank" title="消除HTML标签"> 消除HTML标签 </a> </li> <li> 29. <a href="http://cn.voidcc.com/question/p-tircdcwm-vq.html" target="_blank" title="删除html标签"> 删除html标签 </a> </li> <li> 30. <a href="http://cn.voidcc.com/question/p-dhehxugx-pu.html" target="_blank" title="删除HTML标签"> 删除HTML标签 </a> </li> </ul> </div> <div> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="3534119089"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img2.voidcc.com/voidso/script/side.js?t=1652515421853"></script> <script type="text/javascript" src="http://img2.voidcc.com/voidso/plugin/highlight/highlight.pack.js"></script> <link href="http://img2.voidcc.com/voidso/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 每日一句 </div> <div class="panel-body m-b-sm m-t-sm clearfix"> 每一个你不满意的现在,都有一个你没有努力的曾经。 </div> </div> <div class="row"> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <!-- VOIDCC问答侧边栏广告 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="3862022848" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新问题 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://cn.voidcc.com/question/p-kutgwkrj-bdt.html" target="_blank" title="组成由表演作品的班级未导入的部分的好方法是什么?"> 组成由表演作品的班级未导入的部分的好方法是什么? </a> </li> <li class="side_article_list_item"> 2. <a href="http://cn.voidcc.com/question/p-hqujrngi-bkr.html" target="_blank" title="如何在.NET Core中获取defaultNamingContext?"> 如何在.NET Core中获取defaultNamingContext? </a> </li> <li class="side_article_list_item"> 3. <a href="http://cn.voidcc.com/question/p-gvkppzqr-cc.html" target="_blank" title="如何使用gdb在特定的正在运行的应用程序的文件中调试函数?"> 如何使用gdb在特定的正在运行的应用程序的文件中调试函数? </a> </li> <li class="side_article_list_item"> 4. <a href="http://cn.voidcc.com/question/p-bldgiatc-d.html" target="_blank" title="在WPF中展开和折叠"> 在WPF中展开和折叠 </a> </li> <li class="side_article_list_item"> 5. <a href="http://cn.voidcc.com/question/p-rdwibwwr-td.html" target="_blank" title="标签变量循环"> 标签变量循环 </a> </li> <li class="side_article_list_item"> 6. <a href="http://cn.voidcc.com/question/p-hldknlqz-tg.html" target="_blank" title="是不是将属性设置为等于有效的HTML值?"> 是不是将属性设置为等于有效的HTML值? </a> </li> <li class="side_article_list_item"> 7. <a href="http://cn.voidcc.com/question/p-ffbcpylg-ve.html" target="_blank" title="如何实现全局密钥监听器?"> 如何实现全局密钥监听器? </a> </li> <li class="side_article_list_item"> 8. <a href="http://cn.voidcc.com/question/p-rcpovuwr-vd.html" target="_blank" title="Laravel 5广播事件到通配符通道使用节点和Laravel ECHO-服务器"> Laravel 5广播事件到通配符通道使用节点和Laravel ECHO-服务器 </a> </li> <li class="side_article_list_item"> 9. <a href="http://cn.voidcc.com/question/p-oxcpefrk-uw.html" target="_blank" title="选择默认类型要下载"> 选择默认类型要下载 </a> </li> <li class="side_article_list_item"> 10. <a href="http://cn.voidcc.com/question/p-wsgvjows-bmd.html" target="_blank" title="列表分支按创建日期排序"> 列表分支按创建日期排序 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相关问题</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://cn.voidcc.com/question/p-xjjkgvkm-uh.html" target="_blank" title="ElasticSearch防止搜索html标签"> ElasticSearch防止搜索html标签 </a> </li> <li class="side_article_list_item"> 2. <a href="http://cn.voidcc.com/question/p-wdysfith-h.html" target="_blank" title="用Jquery去除数字前置标签"> 用Jquery去除数字前置标签 </a> </li> <li class="side_article_list_item"> 3. <a href="http://cn.voidcc.com/question/p-abddzjdl-eo.html" target="_blank" title="选择性地去除HTML标签"> 选择性地去除HTML标签 </a> </li> <li class="side_article_list_item"> 4. <a href="http://cn.voidcc.com/question/p-kvjzxsry-a.html" target="_blank" title="如何使用PHPQuery去除HTML标签?"> 如何使用PHPQuery去除HTML标签? </a> </li> <li class="side_article_list_item"> 5. <a href="http://cn.voidcc.com/question/p-qhgrejfw-qu.html" target="_blank" title="回去一个标签索引"> 回去一个标签索引 </a> </li> <li class="side_article_list_item"> 6. <a href="http://cn.voidcc.com/question/p-yhdbrepi-dx.html" target="_blank" title="BeautifulSoup标签去除"> BeautifulSoup标签去除 </a> </li> <li class="side_article_list_item"> 7. <a href="http://cn.voidcc.com/question/p-kovsqfqg-qo.html" target="_blank" title="在document.ready()之前获取HTML标签,DOM呈现之前"> 在document.ready()之前获取HTML标签,DOM呈现之前 </a> </li> <li class="side_article_list_item"> 8. <a href="http://cn.voidcc.com/question/p-gterdmis-un.html" target="_blank" title="如何清除ElasticSearch索引?"> 如何清除ElasticSearch索引? </a> </li> <li class="side_article_list_item"> 9. <a href="http://cn.voidcc.com/question/p-xotediez-mn.html" target="_blank" title="Elasticsearch禁用删除索引"> Elasticsearch禁用删除索引 </a> </li> <li class="side_article_list_item"> 10. <a href="http://cn.voidcc.com/question/p-mboezzwj-dh.html" target="_blank" title="离线删除Elasticsearch索引"> 离线删除Elasticsearch索引 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://cn.voidcc.com/contact">联系我们</a></li> <li>© 2020 CN.VOIDCC.COM</li> <li><a rel="nofollow" href="https://beian.miit.gov.cn/" target="_blank">沪ICP备13005482号-13</a></li> <li><script type="text/javascript" src="https://s9.cnzz.com/z_stat.php?id=1280098168&web_id=1280098168"></script></li> <li><a href="http://cn.voidcc.com/" target="_blank" title="程序问答园区">简体中文</a></li> <li><a href="http://hk.voidcc.com/" target="_blank" title="程序問答園區">繁體中文</a></li> <li><a href="http://ru.voidcc.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.voidcc.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.voidcc.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.voidcc.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.voidcc.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.voidcc.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.voidcc.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.voidcc.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.voidcc.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.voidcc.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.voidcc.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-77509369-5"></script> <script> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag('js', new Date()); gtag('config', 'UA-77509369-5'); </script> <script> var _hmt = _hmt || []; (function () { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?67d4731349f0b00136755b80364ce381"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>