Java：从目录中的文本文件中读取，从互联网上

有谁知道如何递归地从Java中的特定目录中读取文件？我想从这个网页目录中的所有文本文件阅读：http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/Java：从目录中的文本文件中读取，从互联网上

我知道如何在多个文件，这些文件在我的电脑上的文件夹中阅读，我如何从互联网上一个单独的文件读取。但是我怎样才能读取互联网上的多个文件，而不需要对网址进行硬编码？

的东西，我试过：

// List the files on my Desktop 
final File folder = new File("/Users/crystal/Desktop"); 
File[] listOfFiles = folder.listFiles(); 

for (int i = 0; i < listOfFiles.length; i++) { 
    File fileEntry = listOfFiles[i]; 
    if (!fileEntry.isDirectory()) { 
     System.out.println(fileEntry.getName()); 
    } 
}

另一件事我想：

// Reading data from the web 
try 
{ 
    // Create a URL object 
    URL url = new URL("http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt"); 

    // Read all of the text returned by the HTTP server 
    BufferedReader in = new BufferedReader (new InputStreamReader(url.openStream())); 

    String htmlText;  // String that holds current file line 

    // Read through file one line at a time. Print line 
    while ((htmlText = in.readLine()) != null) 
    { 
     System.out.println(htmlText); 
    } 
    in.close(); 
} catch (MalformedURLException e) { 
    e.printStackTrace(); 
} catch (IOException e) { 
    // If another exception is generated, print a stack trace 
    e.printStackTrace(); 
}

谢谢！

来源

2011-05-29 Crystal

解析html并读取文件的URL。 HTMLUnit可能会有所帮助。 – Endophage 2011-05-29 03:16:12

[Looking for a simple Java spider]（http://stackoverflow.com/questions/4903363/looking-for-a-simple-java-spider） – 2011-05-29 03:19:59

“http：//www.cs.ucdavis .. .170-S11/Female/“哇，那些自称'水晶'的小伙子现在已经迫切需要为女性拖网（或者更确切地说是服务器上的目录）吗？ ;） – 2011-05-29 03:52:00

由于您提到的URL已启用索引，因此您很幸运。您在这里有几个选项。

解析html以使用SAX2或任何其他XML解析器来查找a标签的属性。 htmlunit也会工作，我认为。
使用一点点正则表达式魔法来匹配<a href="和">之间的所有字符串，并将其用作url的读取地址。

一旦你得到了你需要的所有URL列表，那么第二段代码应该可以正常工作。只需遍历列表，然后从该列表构造您的URL。

这里有一个示例正则表达式应该匹配你想要的。它确实捕获了一些额外的链接，但你应该能够过滤掉这些链接。

<a\ href="(.+?)">

来源

2011-05-29 03:32:44

谢谢！我认为这正是我需要的。 – Crystal 2011-05-29 03:45:58

没问题。乐意效劳。 – 2011-05-29 03:59:25

强制性[“不要用正则表达式解析html”]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454）评论。虽然在这种情况下，我肯定它的罚款，因为它只是一个页面:) – luke 2011-05-29 06:20:00

Java：从目录中的文本文件中读取，从互联网上

回答

相关问题