2015-03-31 49 views
0

我有一个ArrayListDico我试图将它拆分成多个ArrayLists,但这会导致一些重复。当我分裂时ArrayList中的重复

这是DICO类:

public class Dico implements Comparable { 
    private final String m_term; 
    private double m_weight; 
    private final int m_Id_doc; 

    public Dico(int Id_Doc, String Term, double tf_ief) { 
     this.m_Id_doc = Id_Doc; 
     this.m_term = Term; 
     this.m_weight = tf_ief; 
    } 

    public String getTerm() { 
     return this.m_term; 
    } 

    public double getWeight() { 
     return this.m_weight; 
    } 

    public void setWeight(double weight) { 
     this.m_weight = weight; 
    } 

    public int getDocId() { 
     return this.m_Id_doc; 
    } 

    @Override 
    public int compareTo(Object another) throws ClassCastException { 
     if (!(another instanceof Dico)) 
      throw new ClassCastException("A Dico object expected."); 
     int anotherDocid = ((Dico) another).getDocId(); 
     return this.getDocId() - anotherDocid; 
    } 

    @Override 
    public String toString() { 
     return "id" + getDocId() + "term" + getTerm() + "weight" + getWeight() + ""; 
    } 
} 

而且split_dico功能是使用要做到这一点:

public static void split_dico(List<Dico> list) { 
    int[] changes = new int[list.size() + 1]; // allow for max changes--> contain index of subList 
    Arrays.fill(changes, -1); // if an index is not used, will remain -1 
    changes[0] = 0; 
    int change = 1; 
    int id = list.get(0).getDocId(); 
    for (int i = 1; i < list.size(); i++) { 
     Dico dic_entry = list.get(i); 
     if (id != dic_entry.getDocId()) { 
      changes[change++] = i; 
      id = dic_entry.getDocId(); 
     } 
    } 
    changes[change] = list.size(); // end of last change segment 
    List<List<Dico>> sublists = new ArrayList<>(change); 
    for (int i = 0; i < change; i++) { 
     sublists.add(list.subList(changes[i], changes[i + 1])); 
     System.out.println(sublists); 
    } 
} 

测试:

List<Dico> list = Arrays.asList(new Dico(1, "foo", 1), 
    new Dico(7, "zoo", 5), 
    new Dico(2, "foo", 1), 
    new Dico(3, "foo", 1), 
    new Dico(1, "bar", 2), 
    new Dico(4, "zoo", 0.5), 
    new Dico(2, "bar", 2), 
    new Dico(3, "baz", 3)); 
Collections.sort(list_new); 
split_dico(list_new); 

输出:

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6]] 

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6]] 

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2]] 

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15]] 

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15], [doc id : 7 term : zoo weight : 1.5]] 

我不明白这个功能的问题。

+0

不要使用'Comparable'原始类型。改为使用“可比较的”。 – Tom 2015-04-01 22:12:08

回答

1

在您的打印循环中,您正在打印整个列表子列表后添加一个新的子列表。

相反,根据您的要求,你应该只当你与填充子列表

+0

如果我这样做,我将打印所有子列表,但我想单独使用它,每个列表包含必须包含具有相同ID的文档。 我有解决方案来拆分该ArrayList,但复杂性为100万条款这么高。 我寻找最快溶液 – tommy 2015-04-01 08:47:11

+0

溶液是下一个: 为(列表子列表:子列表) { 的System.out.println(子表); } thanks @micklesh – tommy 2015-04-01 08:51:45

0

我对这个愚蠢的问题,它是如此rediculus,我想得更多的速度soltion对不起完成打印:

public static void split_dico(List<Dico> list) 
    { 
    int[] changes = new int[list.size() + 1]; // allow for max changes--> contain index of subList 
Arrays.fill(changes, -1); // if an index is not used, will remain -1 
changes[0] = 0; 
int change = 1; 
int id = list.get(0).getDocId(); 
for (int i = 1; i < list.size(); i++) 
{ 
    Dico dic_entry = list.get(i); 
    if (id != dic_entry.getDocId()) 
    { 
     changes[change++] = i; 
     id = dic_entry.getDocId(); 
    } 
} 
changes[change] = list.size(); // end of last change segment 
List<List<Dico>> sublists = new ArrayList<>(change); 
for (int i = 0; i < change; i++) 
{ 
    sublists.add(list.subList(changes[i], changes[i + 1])); 

} 
    for (int i = 1; i < sublists.size(); i++) 
{ 
     lists <Dico> = sublists.get(i); 
     system.out.println(lists); 

} 
} 

OUTPUT:

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15], [doc id : 7 term : zoo weight : 1.5]]