执行JavaScript在Java中 - 打开一个URL，并获取链接

import javax.script.ScriptEngine; 
import javax.script.ScriptEngineManager; 
import java.io.FileReader; 

public class Main { 

    public static void main(String[] args) { 

     ScriptEngineManager manager = new ScriptEngineManager(); 
     ScriptEngine engine = manager.getEngineByName("js"); 
     try { 
      FileReader reader = new FileReader("C:/yourfile.js"); 
      engine.put("urlfromjava", "http://www.something.com/?asvb"); 
      engine.eval(reader); 
      reader.close(); 
     } catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 
}

眼下，yourfile.js包含此行执行JavaScript在Java中 - 打开一个URL，并获取链接

function urlget(url) 
{ 
    print("URL:"+url); 
    var loc = window.open(url); 
    var link = document.getElementsByTagName('a')["61"].href; 
    return ("\nLink is: \n"+link); 

} 
var x = urlget(urlfromjava); 
print(x);

我得到的错误

"javax.script.ScriptException: sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "window" is not defined"

如何打开一个URL并从java获得它的链接？

来源

2011-05-22 harihb

按照documentation：

窗口对象表示在浏览器打开的窗口。

由于您未在浏览器中执行脚本，因此未定义窗口对象。

您可以使用URL/URLConnecion类读取URL并将其提供给ScriptEngine。 There is a tutorial here。

来源

2011-05-22 09:51:24 iruediger

哇！看起来像我已经回答了相同的:) – Tapos 2011-05-22 09:54:21

“伟大的头脑相似” – iruediger 2011-05-22 11:40:44

我喜欢的答案，除了，w3schools是像维基百科或随机的网页搜索结果一样多的“文档”。所以这个答案的前两行是不正确的。 – Kit10 2014-01-09 15:22:53

在JavaScript中window表示浏览器窗口。所以当你试图从Java执行这个js时，它无法找到浏览器窗口并且出现错误。您可以使用Java中的URL类来获取URL的内容。

来源

2011-05-22 09:53:16 Tapos

实际上，URL的内容具有超链接，我只能通过使用document.getElementByTagName（'a'）; 所以，我需要加载内存中的网址，做到这一点，并获得链接 – harihb 2011-05-22 10:11:06

你可以使用正则表达式模式解析字符串。 – Tapos 2011-05-22 10:36:48

链接不在页面的源代码中。它通过在服务器端执行的JavaScript加载。 – harihb 2011-05-23 08:22:28

您可以在犀牛嵌入Env.js获得这种功能

来源

2011-05-22 10:04:06 Grooveek

5年前他们似乎已停止工作 – 2016-04-11 14:58:25

试试这个：

import java.net.*; 
import java.io.*; 
    public class URLConnectionReader { 
    public static void main(String[] args) throws Exception { 
     URL yahoo = new URL("http://www.yahoo.com/"); 
     URLConnection yc = yahoo.openConnection(); 
     BufferedReader in = new BufferedReader( 
      new InputStreamReader( 
      yc.getInputStream())); 
     String inputLine; 
     while ((inputLine = in.readLine()) != null) 
      System.out.println(inputLine);// or save to some StringBuilder like this: sb.append(inputLine); then pass the sb.toString() to the method that gets links out of it - > see getLinks below 
     in.close(); 
     } 
    } 



private static final String CLOSING_QUOTE = "\""; 
private static final String HREF_PREFIX  = "href=\""; 
private static final String HTTP_PREFIX  = "http://"; 



public static Set<String> getLinks(String page) { 
    Set<String> links = new HashSet<String>(); 
    String[] rawLinks = StringUtils.splitByWholeSeparator(page, HREF_PREFIX); 
    for (String str : rawLinks) { 
     if(str.startsWith(HTTP_PREFIX)) { 
      links.add(StringUtils.substringBefore(str, CLOSING_QUOTE)); 
     } 
    } 
    return links; 
}

来源

2011-05-23 06:01:29 aviad

抱歉，我无法格式化代码标记 - 浏览器问题... @ Apache Fan - 您是否介意再次执行您的操作？ – aviad 2011-05-23 06:32:01

问题是，页面中的链接是由javascript生成的。所以只有在URL加载后，链接才会到达。即它不在html文件的源代码中。这就是为什么在加载url之后，我使用document.getElementByTagName（'a'）而不是在java中使用URL类来提取链接。 – harihb 2011-05-23 08:21:04

URL.openConnection模拟客户端浏览器的功能，因此您可以获得与浏览器完全相同的标记。尝试一下，我相信你会看到它的作品。如果我不让我知道你得到了什么，我们可以尝试进一步解决问题。 – aviad 2011-05-23 08:29:11

执行JavaScript在Java中 - 打开一个URL，并获取链接

回答

相关问题