解析Java中的文本文件以获取字段的HashMap

-1

我试图解析多个文件并将它们分成一组HashMap中的字段。这是一个样本文件。解析Java中的文本文件以获取字段的HashMap

COCONUT OIL CONTRACT TO CHANGE - DUTCH TRADERS 

    ROTTERDAM, March 18 - Contract terms for trade in coconut 
oil are to be changed from long tons to tonnes with effect from 
the Aug/Sep contract onwards, Dutch vegetable oil traders said. 
    Operators have already started to take account of the 
expected change and reported at least one trade in tonnes for 
Aug/Sept shipment yesterday.

我需要的程序，这个文档解析为一个自定义文档类具有键，文件名，文件名称，地点，日期，作者，内容，类别字段中。

这是我尝试过的。

public static Document parse(String filename) { 

     File f = new File(filename); 

     if (f.isFile()){ 



      String fileId; 
      if (filename.indexOf(".") > 0) { 
       fileId = filename.substring(0, filename.lastIndexOf(".")); 
      } 
      String category = f.getParent(); 

      InputStream in = new FileInputStream(f); 

      byte buf[] = new byte[1024]; 
      int len = in.read(buf); 
      while(len > 0){ 
       .......... 
      } 
      in.close(); 
     } 


     return null; 
    }

来源

2014-09-19 Umar Gul

我很抱歉你试图在这里完成？：O – 2014-09-19 19:18:44

那么，这是一个开始，但很难以相同的方式继续。如果我是你，我现在不再编写代码，首先找出需要采取的高级步骤。把这些步骤写在一张纸上。 '1。将文件完全读入字符串。 2.提取文件标题...等等。然后你可以开始一步一步编码，在每一步之后测试结果。 – biziclop 2014-09-19 19:20:17

下面的代码可以帮助你：

try { 
     FileInputStream fstream = new FileInputStream("myFile.txt"); 
     DataInputStream in = new DataInputStream(fstream); 
     BufferedReader br = new BufferedReader(new InputStreamReader(in)); 
     StringBuffer contentBuffer = new StringBuffer(); 
     String line = null; 
     boolean foundTitle = false; 
     boolean foundPlaceAndDate = false; 
     String date = ""; 
     while ((line = br.readLine()) != null) { 
      if (line.matches("^[a-z-A-Z0-9].*") && !foundTitle) { 
       // If line starts with a letter or number and has no title yet, that's the title 
       System.out.println("Title: " + line); 
       foundTitle = true; 
      } else if (line.matches("^[\\ \t].*") && !foundPlaceAndDate) { 
       // If line starts with a space or tab and it's out first paragraph, then this paragraph has place and date 
       System.out.println("Place: " + line.trim().substring(0, line.trim().indexOf(","))); 
       date = line.trim().substring(line.trim().indexOf(",") + 1, line.trim().indexOf("-")).trim(); 
       System.out.println("Date: " + date); 
       foundPlaceAndDate = true; 
      } 
      contentBuffer.append(line); 
     } 

     String content = contentBuffer.toString().substring(contentBuffer.toString().indexOf(date) + date.length() + 2).trim(); 
     System.out.println("Content: " + content); 

     br.close(); 
     fstream.close(); 
    } catch (Exception e) { 
     System.err.println("Oh no! I got the following error: " + e.getMessage()); 
    }

输出将是：

标题：椰子油合同变更 - 荷兰商人

地点： ROTTERDAM

日期：3月18日

内容：贸易在椰子油合同条款将被从长吨改为吨，起fromthe八月/九月合同的效力，荷兰植物油贸易商称。运营商已经开始考虑预期的变化，并且昨天至少报告了一次交易的吨数。

来源

2014-09-19 19:46:57 shimatai

这确实让我开始了，但我需要将该文件解析为文档类，它看起来像this.public类文档{0} {0} {0} {0} \t \t \t公共文献（）{ \t \t地图=新的HashMap （）; \t} \t \t \t \t 公共无效setField（FIELDNAMES FN，字符串... O）{ \t \t map.put（FN，O）; \t} \t \t \t \t \t公共字符串[] getfield命令（FIELDNAMES FN）{ \t \t返回map.get（FN）; \t} } – 2014-09-19 19:52:27

现在您只需填写Document类的字段即可。例如：'Document document = new Document（）; document.setField（“title”，title）;' – shimatai 2014-09-22 18:10:59

解析Java中的文本文件以获取字段的HashMap

回答

相关问题