2017-02-23 55 views
2

我试图从GTFS.zip借助uniVocity解析器读取CSV文件,并遇到了一个我无法弄清楚的问题。由于某些原因,某些CSV文件的第一列似乎无法正确解析。例如,在“stops.txt”文件看起来像这样:uniVocity不解析第一列到豆

stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station 
"de:3811:30215:0:6","Freiburg Stübeweg","48.0248455941735","7.85563688037231","","Parent30215" 
"de:8311:30054:0:1","Freiburg Schutternstraße","48.0236251356332","7.72434519425597","","Parent30054" 
"de:8311:30054:0:2","Freiburg Schutternstraße","48.0235446600679","7.72438739944883","","Parent30054" 

的“stop_id”字段将无法正确解析的值将是“空”

这是我的方法中号使用读取文件:

public <T> List<T> readCSV(String path, String file, BeanListProcessor<T> processor) { 
    List<T> content = null; 
    try { 
     // Get zip file 
     ZipFile zip = new ZipFile(path); 
     // Get CSV file 
     ZipEntry entry = zip.getEntry(file); 
     InputStream in = zip.getInputStream(entry); 

     CsvParserSettings parserSettings = new CsvParserSettings(); 
     parserSettings.setProcessor(processor); 
     parserSettings.setHeaderExtractionEnabled(true); 

     CsvParser parser = new CsvParser(parserSettings); 
     parser.parse(new InputStreamReader(in)); 
     content = processor.getBeans(); 

     zip.close(); 
     return content; 

    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
    return content; 
} 

这是我停止类看起来像:

public class Stop { 
@Parsed 
private String stop_id; 
@Parsed 
private String stop_name; 
@Parsed 
private String stop_lat; 
@Parsed 
private String stop_lon; 
@Parsed 
private String location_type; 
@Parsed 
private String parent_station; 

public Stop() { 
} 

public Stop(String stop_id, String stop_name, String stop_lat, String stop_lon, String location_type, 
     String parent_station) { 
    this.stop_id = stop_id; 
    this.stop_name = stop_name; 
    this.stop_lat = stop_lat; 
    this.stop_lon = stop_lon; 
    this.location_type = location_type; 
    this.parent_station = parent_station; 
} 

// --------------------- Getter -------------------------------- 
public String getStop_id() { 
    return stop_id; 
} 

public String getStop_name() { 
    return stop_name; 
} 

public String getStop_lat() { 
    return stop_lat; 
} 

public String getStop_lon() { 
    return stop_lon; 
} 

public String getLocation_type() { 
    return location_type; 
} 

public String getParent_station() { 
    return parent_station; 
} 

// --------------------- Setter -------------------------------- 
public void setStop_id(String stop_id) { 
    this.stop_id = stop_id; 
} 

public void setStop_name(String stop_name) { 
    this.stop_name = stop_name; 
} 

public void setStop_lat(String stop_lat) { 
    this.stop_lat = stop_lat; 
} 

public void setStop_lon(String stop_lon) { 
    this.stop_lon = stop_lon; 
} 

public void setLocation_type(String location_type) { 
    this.location_type = location_type; 
} 

public void setParent_station(String parent_station) { 
    this.parent_station = parent_station; 
} 

@Override 
public String toString() { 
    return "Stop [stop_id=" + stop_id + ", stop_name=" + stop_name + ", stop_lat=" + stop_lat + ", stop_lon=" 
      + stop_lon + ", location_type=" + location_type + ", parent_station=" + parent_station + "]"; 
    } 
} 

如果我打电话该方法我得到这个输出这是不正确:

PartialReading pr = new PartialReading(); 
    List<Stop> stops = pr.readCSV("VAGFR.zip", "stops.txt", new BeanListProcessor<Stop>(Stop.class)); 
    for (int i = 0; i < 4; i++) { 
     System.out.println(stops.get(i).toString()); 
    } 

输出:

Stop [stop_id=null, stop_name=Freiburg Stübeweg, stop_lat=48.0248455941735, stop_lon=7.85563688037231, location_type=null, parent_station=Parent30215] 
Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0236251356332, stop_lon=7.72434519425597, location_type=null, parent_station=Parent30054] 
Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0235446600679, stop_lon=7.72438739944883, location_type=null, parent_station=Parent30054] 
Stop [stop_id=null, stop_name=Freiburg Waltershofen Ochsen, stop_lat=48.0220902613143, stop_lon=7.7205756507492, location_type=null, parent_station=Parent30055] 

有谁知道为什么会这样,我该如何解决?这也发生在我测试过的“routes.txt”和“trips.txt”文件中。 这是GTFS文件:http://stadtplan.freiburg.de/sld/VAGFR.zip

回答

0

如果您打印标题,您会注意到第一列看起来不正确。这是因为您正在使用UTF-8解析使用BOM标记编码的文件。

基本上文件以几个字节开始,指示什么是编码。解析器不处理内部,但你可以跳过这些字节以获得正确的输出:

//... your code here 
ZipEntry entry = zip.getEntry(file); 
InputStream in = zip.getInputStream(entry); 

if(in.read() == 239 & in.read() == 187 & in.read() == 191){ 
    System.out.println("UTF-8 with BOM, bytes discarded"); 
} 

CsvParserSettings parserSettings = new CsvParserSettings(); 
//...rest of your code here 

以上的黑客会的工作,但你可以使用commons-IO提供了方便BOMInputStream和更清理处理这类事情。

希望它有帮助。

+0

谢谢,这解决了我的问题 – Kazanagi

相关问题