CSV解析的书

我有一个项目中，我解析一个CSV文件，其中将包含一本教科书的章节和小节麻烦，而且看起来是这样的：CSV解析的书

Chapter, Section, Lesson #this line shows how the book will be organized 
Ch1Name, Secion1Name, Lesson1Name 
Ch1Name, Secion2Name, Lesson1Name 
Ch1Name, Secion2Name, Lesson2Name

我为每个部分创建Django模型对象，并且每个部分都有一个父属性，它是它所在的父部分。我无法想出一种通过csv文件的方式，以使父分配正确。如何开始的任何想法都会很棒。

来源

2013-02-21 taman

希望CSV中的数据没有空格，或者让事情变得有趣。无论如何，[请看看Python中的csv模块。]（http://docs.python.org/2/library/csv.html） – Makoto 2013-02-21 22:26:28

首先，希望您已经在使用csv模块，而不是尝试手动解析它。其次，从你的问题来看，这并不完全清楚，但听起来好像你正在试图从数据中构建一个简单的树形结构。

那么，这样的事情？

with open('book.csv') as book: 
    chapters = collections.defaultdict(collections.defaultdict(list)) 
    book.readline() # to skip the headers 
    for chapter_name, section_name, lesson_name in csv.reader(book): 
     chapters[chapter_name][section_name].append(lesson_name)

当然这是假设你想要一个 “关联树” -a dict的dict S的。更正常的线性树，如list的list s，或者以“父指针”形式的隐式树，甚至更简单。

例如，假设你有一个像这样定义类：

class Chapter(object): 
    def __init__(self, name): 
     self.name = name 

class Section(object): 
    def __init__(self, chapter, name): 
     self.chapter = chapter 
     self.name = name 

class Lesson(object): 
    def __init__(self, section, name): 
     self.section = section 
     self.name = name

而你要为每一个dict，映射名称的对象。所以：现在

with open('book.csv') as book: 
    chapters, sections, lessons = {}, {}, {} 
    book.readline() # to skip the headers 
    for chapter_name, section_name, lesson_name in csv.reader(book): 
     chapter = chapters.setdefault(chapter_name, Chapter(chapter_name)) 
     section = sections.setdefault(section_name, Section(chapter, section_name)) 
     lesson = lessons.setdefault(lesson_name, Lesson(section, lesson_name))

，你可以选择一个随机的教训，并打印其章，节：

lesson = random.choice(lessons.values()) 
print('Chapter {}, Section {}: Lesson {}'.format(lesson.section.chapter.name, 
               lesson.section.name, lesson.name))

最后一两件事要记住：在这个例子中，父引用不要引起任何循环引用，因为父母没有提及他们的孩子。但是如果你需要那个呢？

class Chapter(object): 
    def __init__(self, name): 
     self.name = name 
     self.sections = {} 

class Section(object): 
    def __init__(self, chapter, name): 
     self.chapter = chapter 
     self.name = name 
     self.lessons = {} 

# ... 

chapter = chapters.setdefault(chapter_name, Chapter(chapter_name)) 
section = sections.setdefault(section_name, Section(chapter, section_name)) 
chapters[section_name] = section

到目前为止，这么好......但是当你完成所有这些对象时会发生什么？他们有循环引用，这可能会导致垃圾回收问题。不是无法克服的问题，但它确实意味着在大多数实现中对象不会被快速收集。例如，在CPython中，只要最后一个引用超出范围，通常就会收集事件 - 但是如果您有循环引用，则永远不会发生，因此直到循环检测器的下一个传递才会收集任何内容。解决方法是使用父指针的weakref（或子集的weakref集合）。

来源

2013-02-21 22:28:46 abarnert

谢谢。我实际上正在使用csv模块，但感谢您的建议。我认为你已经接近我的想法，但问题在于：例如，从上面的csv示例中，我需要Chapter1中的Section1和Secion2指向同一个Chapter1节对象。我希望这是有道理的。 – taman 2013-02-21 23:01:29

好的，你想建立某种'Section'对象？让我编辑答案。 – abarnert 2013-02-21 23:43:31

谢谢！我认为这会帮助我解决问题！ – taman 2013-02-22 01:13:19

CSV解析的书

回答

相关问题