从csv生成树结构

我已经在这个问题上摸到了一段时间了。我基本上试图从一组CSV数据中生成一个树层次结构。 CSV数据不一定是有序的。这就像下面这样：从csv生成树结构

Header: Record1,Record2,Value1,Value2 
Row: A,XX,22,33 
Row: A,XX,777,888 
Row: A,YY,33,11 
Row: B,XX,12,0 
Row: A,YY,13,23 
Row: B,YY,44,98

我试图让分组的执行方式尽可能灵活。对于分组的最简单的将这样做对记录1和RECORD2与RECORD2下存储的值1和值2，使我们得到如下输出：

Record1 
    Record2 
     Value1 Value2

具体做法是：

我存储我目前在List中的组设置 - 我不知道这是否妨碍了我的想法。此列表中包含组例如层次：

Record1 (SchemaGroup) 
    .column = Record1 
    .columns = null 
    .childGroups = 
     Record2 (SchemaGroup) 
      .column = Record1 
      .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation) 
      .childGroups = null

该代码，这看起来像如下：

private class SchemaGroup { 
    private SchemaGroupType type = SchemaGroupType.StaticText; // default to text 
    private String text; 
    private CSVColumnInformation column = null; 
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>(); 
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>(); 
} 


private enum SchemaGroupType { 
    /** Allow fixed text groups to be added */ 
    StaticText, 
    /** Related to a column with common value */ 
    ColumnGroup 
}

我都有点吃力生产这种算法，冥思苦想的底层结构使用。目前我使用自己的包装类从上到下解析CSV：

CSVParser csv = new CSVParser(content); 
String[] line; 
while((line = csv.readLine()) != null) { 
    ... 
}

我只是试图启动我的编码大脑。

有什么想法？

来源

2011-10-05 Andez

什么是大局？如果行A，XX，12,34，你怎么知道12和34属于A或XX？ – palacsint

基于这个问题是怎么造成的，我会做到以下几点：

定义你的最终数据结构看起来像遏制树。
定义您的原始文本中的每一行的表示（可能是一个灵活的链表）
编写一个方法，它将表示的行插入到树数据结构中。对于每个不存在的分支，创建它;对于每个现有分支，在遍历“行”链接列表结构时遍历它。
从空树开始。
阅读文件的每一行到您的行项目结构，并呼吁在步骤3

是否帮助定义的方法？

来源

2011-10-05 20:31:09 lurker

的基本思想是并不困难：第一记录组，然后通过第二个记录，等等，直到你得到的东西是这样的：

(A,XX,22,33) 
(A,XX,777,888) 
------------------------- 
(A,YY,33,11) 
(A,YY,13,23) 
============= 
(B,XX,12,0) 
------------------------- 
(B,YY,44,98)

，然后向后努力建立的树木。

但是，有一个递归组件使得对这个问题进行推理或者逐步显示它有点困难，所以编写伪代码实际上更容易。

我假设您的csv中的每一行都表示为一个元组。每个元组都有“记录”和“值”，使用您在问题中使用的相同术语。“记录”是必须纳入分层结构的东西。 “价值观”将成为树的叶子。当我使用这些具有这些特定含义的术语时，我将使用引号。

我还假设所有“记录”都出现在所有“值”之前。

事不宜迟，代码：

// builds tree and returns a list of root nodes 
// list_of_tuples: a list of tuples read from your csv 
// curr_position: used to keep track of recursive calls 
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n 
function build_tree(list_of_tuples, curr_position, number_of_records) { 
    // check if we have already reached the "values" (which shouldn't get converted into trees) 
    if (curr_position == number_of_records) { 
     return list of nodes, each containing a "value" (i.e. everything from position number_of_records on) 
    } 

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value 
    unique_values = get unique values in curr_position 

    list_of_nodes = empty list 

    // create the nodes and (recursively) their children 
    for each val in unique_values { 
     the_node = create tree node containing val 
     the_children = build_tree(grouped[val], curr_position+1, number_of_records) 
     the_node.set_children(the_children) 

     list_of_nodes.append(the_node) 
    } 

    return list_of_nodes 
} 

// in your example, this returns a node with "A" and a node with "B" 
// third parameter is 2 because you have 2 "records" 
build_tree(list_parsed_from_csv, 0, 2)

现在你不得不考虑一下具体的数据结构来使用，但希望这不应该是，如果你理解算法太困难（如您提及，我认为尽早决定数据结构可能会阻碍你的想法）。

来源

2011-10-05 20:35:57

感谢您的伪想法。 – Andez

@Andez对不起，没有真正的java代码，我只是觉得在这个阶段为了专注于算法，草图会更合适。请注意，这是目前处理任意数量的记录级别的唯一解决方案。如果您想将代码翻译成Java（或者任何人想根据我发布的内容发布Java答案），我很乐意提供帮助。 –

如果你知道你将有只有两个级别的Record S，当你读到新行我会使用类似

Map<string, Map<string, List<Values>>>

，你看看到外地图，以检查是否为Record1已经是值如果不存在，则为其创建新的空内部Map。

然后检查内部图是否存在该Record2的值。如果没有，则创建新的List。

然后读取值并将它们添加到列表中。

来源

2011-10-05 21:42:46 svick

就是这样，我会有更多的列依赖于我处理的csv文件。我最初考虑过地图。谢谢安德兹 – Andez

这里是junit形式的基本工作解决方案（虽然没有断言）通过使用google-guava collections简化。代码是不言自明的，而不是文件io，您可以使用csv库来读取csv。这应该给你基本的想法。

import java.io.File; 
import java.io.IOException; 
import java.util.Collection; 
import java.util.Collections; 
import java.util.List; 
import java.util.Set; 

import org.junit.Test; 

import com.google.common.base.Charsets; 
import com.google.common.base.Splitter; 
import com.google.common.collect.ArrayListMultimap; 
import com.google.common.collect.Iterables; 
import com.google.common.collect.Multimap; 
import com.google.common.collect.Sets; 
import com.google.common.io.Files; 

public class MyTest 
{ 
    @Test 
    public void test1() 
    { 
     List<String> rows = getAllDataRows(); 

     Multimap<Records, Values> table = indexData(rows); 

     printTree(table); 

    } 

    private void printTree(Multimap<Records, Values> table) 
    { 
     Set<String> alreadyPrintedRecord1s = Sets.newHashSet(); 

     for (Records r : table.keySet()) 
     { 
      if (!alreadyPrintedRecord1s.contains(r.r1)) 
      { 
       System.err.println(r.r1); 
       alreadyPrintedRecord1s.add(r.r1); 
      } 

      System.err.println("\t" + r.r2); 

      Collection<Values> allValues = table.get(r); 

      for (Values v : allValues) 
      { 
       System.err.println("\t\t" + v.v1 + " , " + v.v2); 
      } 
     } 
    } 

    private Multimap<Records, Values> indexData(List<String> lines) 
    { 
     Multimap<Records, Values> table = ArrayListMultimap.create(); 

     for (String row : lines) 
     { 
      Iterable<String> split = Splitter.on(",").split(row); 
      String[] data = Iterables.toArray(split, String.class); 

      table.put(new Records(data[0], data[1]), new Values(data[2], data[3])); 
     } 
     return table; 
    } 

    private List<String> getAllDataRows() 
    { 
     List<String> lines = Collections.emptyList(); 

     try 
     { 
      lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII); 
     } 
     catch (IOException e) 
     { 
      e.printStackTrace(); 
     } 

     lines.remove(0);// remove header 

     return lines; 
    } 
} 



public class Records 
{ 
    public final String r1, r2; 

    public Records(final String r1, final String r2) 
    { 
     this.r1 = r1; 
     this.r2 = r2; 
    } 

    @Override 
    public int hashCode() 
    { 
     final int prime = 31; 
     int result = 1; 
     result = prime * result + ((r1 == null) ? 0 : r1.hashCode()); 
     result = prime * result + ((r2 == null) ? 0 : r2.hashCode()); 
     return result; 
    } 

    @Override 
    public boolean equals(final Object obj) 
    { 
     if (this == obj) 
     { 
      return true; 
     } 
     if (obj == null) 
     { 
      return false; 
     } 
     if (!(obj instanceof Records)) 
     { 
      return false; 
     } 
     Records other = (Records) obj; 
     if (r1 == null) 
     { 
      if (other.r1 != null) 
      { 
       return false; 
      } 
     } 
     else if (!r1.equals(other.r1)) 
     { 
      return false; 
     } 
     if (r2 == null) 
     { 
      if (other.r2 != null) 
      { 
       return false; 
      } 
     } 
     else if (!r2.equals(other.r2)) 
     { 
      return false; 
     } 
     return true; 
    } 

    @Override 
    public String toString() 
    { 
     StringBuilder builder = new StringBuilder(); 
     builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]"); 
     return builder.toString(); 
    } 

} 


public class Values 
{ 
    public final String v1, v2; 

    public Values(final String v1, final String v2) 
    { 
     this.v1 = v1; 
     this.v2 = v2; 
    } 

    @Override 
    public int hashCode() 
    { 
     final int prime = 31; 
     int result = 1; 
     result = prime * result + ((v1 == null) ? 0 : v1.hashCode()); 
     result = prime * result + ((v2 == null) ? 0 : v2.hashCode()); 
     return result; 
    } 

    @Override 
    public boolean equals(final Object obj) 
    { 
     if (this == obj) 
     { 
      return true; 
     } 
     if (obj == null) 
     { 
      return false; 
     } 
     if (!(obj instanceof Values)) 
     { 
      return false; 
     } 
     Values other = (Values) obj; 
     if (v1 == null) 
     { 
      if (other.v1 != null) 
      { 
       return false; 
      } 
     } 
     else if (!v1.equals(other.v1)) 
     { 
      return false; 
     } 
     if (v2 == null) 
     { 
      if (other.v2 != null) 
      { 
       return false; 
      } 
     } 
     else if (!v2.equals(other.v2)) 
     { 
      return false; 
     } 
     return true; 
    } 

    @Override 
    public String toString() 
    { 
     StringBuilder builder = new StringBuilder(); 
     builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]"); 
     return builder.toString(); 
    } 

}

来源

2011-10-05 22:45:07

谢谢，看起来很有趣。将研究它。 – Andez

我最近有一个需要做的是几乎同样的事情，写tree-builder.com来完成任务。唯一的区别是，当你有CSV文件时，最后两个参数将是父母和孩子，而不是同龄人。另外，我的版本不接受标题行。

代码全部在JavaScript中;它使用jstree来构建树。您可以使用萤火虫或只查看网页上的来源，看看它是如何完成的。调整它可能很容易，以避免您的CSV中的逗号，以保持最后两个参数是一个孩子。

来源

2014-07-18 12:10:33 Witt

public static void main (String arg[]) throws Exception 
{ 
    ArrayList<String> arRows = new ArrayList<String>(); 
    arRows.add("A,XX,22,33"); 
    arRows.add("A,XX,777,888"); 
    arRows.add("A,YY,33,11"); 
    arRows.add("B,XX,12,0"); 
    arRows.add("A,YY,13,23"); 
    arRows.add("B,YY,44,98"); 
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable 
     System.out.println(sTreeRow); 
} 
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception 
{ 
    ArrayList<String> arReturnNodes = new ArrayList<String>(); 
    Collections.sort(arRows); 
    String sLastPath = ""; 
    int iFolderLength = 0; 
    for(int iRow=0;iRow<arRows.size();iRow++) 
    { 
     String sRow = arRows.get(iRow); 
     String[] sFolders = sRow.split(sSeperator); 
     iFolderLength = sFolders.length; 
     String sTab = ""; 
     String[] sLastFolders = sLastPath.split(sSeperator); 
     for(int i=0;i<iFolderLength;i++) 
     { 
      if(i>0) 
       sTab = sTab+" "; 
      if(!sLastPath.equals(sRow)) 
      { 

       if(sLastFolders!=null && sLastFolders.length>i) 
       { 
        if(!sLastFolders[i].equals(sFolders[i])) 
        { 
         arReturnNodes.add(sTab+sFolders[i]+""); 
         sLastFolders = null; 
        } 
       } 
       else 
       { 
        arReturnNodes.add(sTab+sFolders[i]+""); 
       } 
      } 
     } 
     sLastPath = sRow; 
    } 
    return arReturnNodes; 
}

来源

2016-02-20 05:04:42 Sansice

从csv生成树结构

回答

相关问题