2014-02-20 31 views
0

内返回脚本生成的电子邮件ID我有一个文件对象为:Jsoup如何

Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get(); 
String txt = secDoc.text(); 

现在,当我调试的上方,我检查secDoc的价值,我得到它有一个正常的页面源元素:

For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>. 

如果你看到自己的网页,你可以看到一个路线为:For questions about your order, including anything shipping or billing related, please email [email protected] We only do email support at this time. 有趣的是,这个脚本生成的页面上的电子邮件ID。做一个检查元素,我得到:

<p> 
       For questions about your order, including anything shipping or billing related, please email <a href="mailto:[email protected]">[email protected]</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>. 
       We only do email support at this time.<br><br> 
       Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong> 
       <br> 
       <strong>Shipping Times</strong>: 
       We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer. 
       We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday. 
       If we are behind, it may be a few days before we respond. The Oatmeal is an extremely small operation so please be patient. 
       <br> 
       <a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br> 
       Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a> 
      </p> 

所以锚:<a href="mailto:[email protected]">[email protected]</a> 越来越由脚本生成。

是否有无论如何我可以得到这个锚使用Jsoup(或任何其他手段)?

回答

1

对于此特定网站,地址的用户和域部分位于脚本标记中,因此选择脚本标记,获取其文本,使用正则表达式解析该文本,然后将用户和电子邮件连接起来,并将其与@在之间。您的选择器可能只是script:contains(write_email),假设write_email未在页面的其他位置使用。这仅适用于地址在文本中显示的地方,即使它是两件。

一般来说,Jsoup不是JavaScript引擎。如果您想使用Web浏览器查看人类看到的同一页面,您可以尝试像Selenium这样的浏览器自动化工具。