2011-04-04 99 views
2

可以说:我有一个用户输入“placeofjo.blogspot.com”如何删除文本文件中的字符串,如果不匹配用户输入字符串?

我的代码从该网站提取链接并将链接放置在文本文件中。

现在的文本文件有这样的内容:

http://www.twitter.com/jozefinfin/ 
http://www.facebook.com/jozefinfin/ 
http://placeofjo.blogspot.com/2008_08_01_archive.html 
http://placeofjo.blogspot.com/2008_09_01_archive.html 
http://placeofjo.blogspot.com/2008_10_01_archive.html 
http://placeofjo.blogspot.com/2008_11_01_archive.html 
http://placeofjo.blogspot.com/2008_12_01_archive.html 
http://placeofjo.blogspot.com/2009_01_01_archive.html 
http://placeofjo.blogspot.com/2009_02_01_archive.html 
http://placeofjo.blogspot.com/2009_03_01_archive.html 
http://placeofjo.blogspot.com/2009_04_01_archive.html 
http://placeofjo.blogspot.com/2009_05_01_archive.html 
http://placeofjo.blogspot.com/2009_06_01_archive.html 
http://placeofjo.blogspot.com/2009_07_01_archive.html 
http://placeofjo.blogspot.com/2009_08_01_archive.html 
http://placeofjo.blogspot.com/2009_09_01_archive.html 
http://placeofjo.blogspot.com/2009_10_01_archive.html 
http://placeofjo.blogspot.com/2009_11_01_archive.html 
http://placeofjo.blogspot.com/2010_01_01_archive.html 
http://placeofjo.blogspot.com/2010_02_01_archive.html 
http://placeofjo.blogspot.com/2010_04_01_archive.html 
http://placeofjo.blogspot.com/2010_06_01_archive.html 
http://placeofjo.blogspot.com/2010_07_01_archive.html 
http://placeofjo.blogspot.com/2010_08_01_archive.html 
http://placeofjo.blogspot.com/2010_10_01_archive.html 
http://placeofjo.blogspot.com/2010_11_01_archive.html 
http://placeofjo.blogspot.com/2011_01_01_archive.html 
http://placeofjo.blogspot.com/2011_02_01_archive.html 
http://placeofjo.blogspot.com/2011_03_01_archive.html 
http://endlessdance.blogspot.com 
http://blogskins.com/me/aaaaaa 
http://weheartit.com 

我想删除

http://www.twitter.com/jozefinfin/ 
http://www.facebook.com/jozefinfin/ 
http://endlessdance.blogspot.com 
http://blogskins.com/me/aaaaaa 
http://weheartit.com 

且仅这仅仅是类似于用户的输入字符串离开它。 我该怎么做?

的文本文件所需的内容:

http://placeofjo.blogspot.com/2008_08_01_archive.html 
    http://placeofjo.blogspot.com/2008_09_01_archive.html 
    http://placeofjo.blogspot.com/2008_10_01_archive.html 
    "     " 
    "     " 

回答

1
  1. 通过
  2. 线
  3. 检查线路是否含有用户输入
  4. 如果是读取文件线,将其写入新文件
0

假设您可以同时在内存中保存整个链接列表,您可能可以从链接网站e ...

  1. 阅读文件,拆分换行符,并生成链接列表。
  2. 过滤列表,以消除任何不匹配的链接
  3. 程序将最终过滤列表回文件,替换文件

的旧内容在过滤器的匹配,我想到的是使用

string.indexOf(inputToMatch) > 0 // it matches 
0

而不是建立一个文本文件,然后过滤它。在解析网页时执行过滤器。只需查找符合条件的链接,并只写入文件的良好链接即可。

0

这里是解决这个问题的正则表达式的方法。但是,你不应该使用大文件这种解决方案..

import java.io.File; 
import java.io.IOException; 
import java.util.regex.Pattern; 
import org.apache.commons.io.FileUtils; 

public class FileReplacer { 


    public static void main(String[] args) { 
     replaceFileContent(); 
    } 

    public static void replaceFileContent() { 
     try { 
      String allStr = FileUtils.readFileToString(new File("c:/temp/data.txt")); 
      Pattern pattern =Pattern.compile("^(?!http://placeofjo\\.blogspot\\.com/.*$).+$(\\r\\n)?", Pattern.MULTILINE); 
      String newAllStr = pattern.matcher(allStr).replaceAll(""); 
      FileUtils.writeStringToFile(new File("c:/temp/newdata.txt"), newAllStr); 

     } catch (IOException e) { 
      // TODO Auto-generated catch block 
      throw new RuntimeException(e); 
     } 
    } 
} 
+0

如果该模式被编译一次,然后在一个循环,而不是多用,会那么性能表现如此糟糕呢?那就是我会做的。 – ArtB 2011-04-04 15:22:10

+0

@ArtB那么,在这种情况下,性能不会降低太多。因为只有一条线会被考虑,但是如果你的线包含数千个字符,它也不是一个好的选择。 – 2011-04-04 17:35:50