下面是我的代码来检测缩写及其长表格。代码循环遍历文档中的一行,循环遍历该行的每个单词并标识缩写候选项。然后它再次循环遍历文档的每一行以找到缩写的适当长格式。我的问题是,如果在文档中多次出现首字母缩略词,我的输出包含多个实例。我只想用所有可能的长格式打印缩写词一次。这里是我的代码:删除重复键值对中的值在列表中
public static void main(String[] args) throws FileNotFoundException
{
BufferedReader in = new BufferedReader(new FileReader("D:\\Workspace\\resource\\SampleSentences.txt"));
String str=null;
ArrayList<String> lines = new ArrayList<String>();
String matchingLongForm;
List <String> matchingLongForms = new ArrayList<String>() ;
List <String> shortForm = new ArrayList<String>() ;
Map<String, List<String>> abbreviationPairs = new HashMap<String, List<String>>();
try
{
while((str = in.readLine()) != null){
lines.add(str);
}
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
String[] linesArray = lines.toArray(new String[lines.size()]);
// document wide search for abbreviation long form and identifying several appropriate matches
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
//shortForm.add(word);
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))){
matchingLongForms.add(matchingLongForm);
//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}
if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}
else
continue;
}
}
}
}
下面是电流输出:
Abbreviation Pair: {GLBA=[Gramm Leach Bliley act]}
Abbreviation Pair: {NCUA=[National credit union administration]}
Abbreviation Pair: {FFIEC=[Federal Financial Institutions Examination Council]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {OFAC=[Office of Foreign Assets Control]}
是'地图<字符串,请设置> abbreviationPairs'的选项? –
bradimus
请注意['Files.readAllLines']的存在(https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#readAllLines(java.nio.file.Path ,%20java.nio.charset.Charset))。通过重新发明轮子,你正在浪费你的时间......此外,你可以简单地写'for(String line:lines){...',而不需要将List的内容复制到数组中。 – Holger