Python搜索数据集

我有这个作为一个家庭作业问题，并不知道我应该如何去做。Python搜索数据集

首先，我给了一个数据集，其中包含员工姓名，地址，电子邮件等清单，共有约50名员工。

你被要求写一个应用程序来提供有关员工的信息。你的程序应该提示用户输入搜索条件。谁与搜索标准匹配的工作人员中的任何成员应打印在屏幕上以下面的格式：

Position Designation Room and Extension Name and Email Address
（列是制表符分隔）

Matching信息.......... ..
您将不得不修改数据集进行处理，并且您可以选择将其保存在单独的文件中，但这不是必需的。您的程序应该满足一定的限制条件：

您应该将数据集中的每一列与搜索条件进行比较。

比较不应区分大小写。

除电子邮件地址外，所有输出都应在首笔资本中。

如果找到匹配项，则应打印结果行并且列应全部对齐。

如果没有匹配，则应打印一条消息，不要有标题行。

您应该保存（1）您的程序，（2）一段说明您是如何完成数据集的处理的。

你也应该运行你的应用这些测试用例：

为“布兰达”

搜索所有文书人员搜索。

为“BredNa”

检索查找卡尔博士的位置

哪个办公室尼尔位于？

所以，首先，我应该如何读取这个数据集？我应该将它作为文本文件读取还是创建一个元组，字典？等

staff = [['prof.liam maguire','head of school','academic','MS127','75605','[email protected]'], 
['prof. martin McGinnity','director of intelligent systems research centre','academic','MS112','75616','[email protected]'], 
['dr laxmidhar Behera','reader','academic','MS107','75276','[email protected]'], 
['dr girijesh Prasad','professor','academic','MS137','75645','[email protected]'], 
['dr kevin Curran','senior lecturer','academic','MS130','75565','[email protected]'], 
['mr aiden McCaughey','Senior Lecturer','academic','MG126','75131','[email protected]'], 
['dr tom Lunney','postgraduate courses co-ordinator (Senior Lecturer)','academic','MG121D','75388','[email protected]'], 
['dr heather Sayers','undergraduate courses','co-ordinator (Senior Lecturer)','academic','MG121C','75148','[email protected]'], 
['dr liam Mc Daid','senior lecturer','academic','MS016','75452','[email protected]'], 
['mr derek Woods','senior lecturer','academic','MS134','75380','[email protected]'], 
['dr ammar Belatreche','lecturer','academic','MS104','75185','[email protected]'], 
['mr michael Callaghan','lecturer','academic','MS132','75771','[email protected]'], 
['dr sonya Coleman','lecturer','academic','MS133','75030','[email protected]'], 
['dr joan Condell','lecturer','academic','MS131','75024','[email protected]'], 
['dr damien Coyle','lecturer','academic','MS103','75170','[email protected]'], 
['mr martin Doherty','lecturer','academic','MG121A','75552','[email protected]'], 
['dr jim Harkin','lecturer','academic','MS108','75128','[email protected]'], 
['dr yuhua Li','lecturer','academic','MS106','75528','[email protected]'], 
['dr sandra Moffett','lecturer','academic','MS015','75381','[email protected]'], 
['mrs mairin Nicell','lecturer','academic','MG127','75007','[email protected]'], 
['mrs maeve Paris','lecturer','academic','MG040','75212','[email protected]'], 
['dr jose Santos','lecturer','academic','MG035','75034','[email protected]'], 
['dr nH. Siddique','lecturer','academic','MG037','75340','[email protected]'], 
['dr zumao Weng','lecturer','academic','MG050','75358','[email protected]'], 
['dr shane Wilson','lecturer','academic','MG038','75527','[email protected]'], 
['dr caitriona carr','computing and Technical Support','MG121B','75003','[email protected]'], 
['mr neil McDonnell','technical Services Supervisor','computing and Technical Support','MS030/MF143','75360','[email protected]'], 
['mr paddy McDonough','technical Services Engineer','computing and Technical Support','MS034','75322','[email protected]'], 
['mr bernard McGarry','network Assistant','computing and Technical Support','MG132','75644','[email protected]'], 
['mr stephen Friel','secretary','clerical staff','MG048','75148','[email protected]'], 
['ms emma McLaughlin','secretary','clerical staff','MG048','75153','[email protected]'], 
['mrs. brenda Plummer','secretary','clerical staff','MS126','75605','[email protected]'], 
['miss paula Sheerin','secretary','clerical staff','MS111','75616','[email protected]'], 
['mrs michelle Stewart','secretary','clerical staff','MG048','75382','[email protected]']] 


matches = [] 

criterion = input ("please enter search criterion: ") 
criterion = criterion.lower() 

for person in staff: 
for characteristic in person: 
if characteristic in person: 
if criterion in characteristic: 
matches.append(person) 
break 
if len(matches) == 0: 
print("No Match") 
else: 
    print("POSITION |||DESIGNATION ||| EXT & ROOM NO||| NAME & EMAIL") 
for i in matches: 
print (i[1].title(),': ',i[2].title(),':',i[3].upper()+ i[4],':',i[0].title(), i[5].title())`

这是香港专业教育学院想出了这么远，它似乎工作，在那里你会作出改善？

来源

2012-03-21 smorr87

数据集的格式是什么？你能提供一个样本入口吗？另外，你到目前为止尝试过什么？ – Taymon 2012-03-21 19:26:18

谢谢你诚实告诉我们，这是一个家庭作业的问题。 StackOverflow不鼓励直接给出家庭作业问题的答案，但我们可以引导您找到正确的答案。

关于“修改数据集进行处理”：这意味着数据当前没有一致的格式。您需要做的第一件事就是查看您提供的数据，并确定数据的最佳表示形式。

我推荐一个列标签分隔数据文件 - 这很容易在Microsoft Excel中创建，方法是将数据放入电子表格中，并将其保存为文本。（Excel会抱怨说它会失去所有使它成为电子表格而不是文本文件的所有东西，但这没关系 - 你想要一个文本文件。）保存更新的文件。

Excel中产生什么叫做制表符分隔文本文件：数据的2维网格（如电子表格的形状），每行（改写数据的一行表示的，换行符符号是用于分隔数据行，文本编辑器将其解释为开始在新行上写入的命令）以及制表符（用Python在转义字符串中编写为\t，但实际上是它自己的单个字符）每一行。这也被称为制表符分隔值或TSV。密切相关的是逗号分隔值或CSV，这是Excel中的另一个选项。 CSV也可以代表字符分隔值，这是通过使用某些字符（'，'为逗号分隔，'\ t'为制表符分隔来表示数据网格来分隔数据网格以便分离的通用术语记录。

CSV是一种非常常见的文件格式，因此Python已经准备好在这里为您提供帮助。 Python有a library, csv，旨在为你读取这些文件。如果您使用的是Excel文本格式，则需要告诉它dialect是excel-tab，因为它象征Excel制表符分隔的文件。

您需要构建一个csv.reader来读取格式化的数据文件。使用您放置列的顺序来了解当您每次读取CSV一行时获得的列表 - 列的顺序和每行中项目的顺序相同，因此请使用该信息来索引正确地进入列表以查找每个字段。

一旦你读了一行，你想用它做什么？

你的存储格式可以选择在你的程序：

保存每一条记录到一个列表（表现得像一个列表的列表，因为每个记录的行为就像列表）。现在它已经加载，并且当你想搜索它时，你遍历整个列表列表并且使用相等性测试来查找匹配。这可以通过列表理解来完成，这几乎可以肯定你的老师正在寻找什么。
此外，为每个文件列创建一个字典，并在每个字典中存储每条记录：每个字典将该列值映射到您的密钥。这里有一个问题！一个字典只能存储每个键的一个记录，但你肯定会在同一个“指定”（多个教授，多个文书人员等）中拥有不同的人员，并且无法确定没有两个人会拥有同名，要么。索引记录必须自己存储记录列表，而不仅仅是单个记录。

对于重复查询，第二种方法要快得多，因为您在开始时组织了所有记录以进行快速查找。然而，第一个实施起来要容易得多，而且更有可能成为您的老师所期望的。我建议实现第一个，理解它，然后如果你有时间，实施第二个。

所有这些的用户界面当然都取决于您，但这应该会让您很好地实现程序的核心。祝你好运。

来源

2012-03-21 19:39:38

在下面发布了我的尝试 – smorr87 2012-03-22 19:33:01

我假设你有你的数据集作为一个纯文本文件（或电子邮件可复制文本等），那么你有几种选择：

创建一个文本文件，其中每行存储信息关于一位员工的指定格式：“姓名”，“职位”等在这种情况下，要执行搜索，您需要扫描文件并打印匹配的行，然后重复匹配的部分。
使用Python数据类型将信息存储在内存中，例如一个名为“Name”，“Position”等的字典列表。然后，搜索将变得稍微复杂一点（只是一点点，真的），但是你可以用任何你喜欢的方式格式化输出。但是，首先您需要通过阅读文本文件（或手动硬编码，如果您绝望）用数据填充列表。
您可以通过仅从文件的匹配行形成字典来稍微结合这些方法。
你可以使用像MySQL这样的真正的数据库引擎，但是这对于这个作业来说可能是一个真正的矫枉过正。

来源

2012-03-21 19:35:27

这是我怎么会去一下：

staff_details = [["Prof. Liam Maguire","Head of School","Academic","MS127","75605","[email protected]"], 
       ["Prof. Martin McGinnity","Director of Intelligent Systems Research Centre","Academic","MS112","75616","[email protected]"], 
       ["Dr Laxmidhar Behera","Reader","Academic","MS107","75276", "[email protected]"], 
       ["Dr Girijesh Prasad","Professor","Academic","MS137","75645","[email protected]"], 
       ["Dr Kevin Curran","Senior Lecturer","Academic","MS130","75565","[email protected]"], 
       ["Mr Aiden McCaughey","Senior Lecturer","Academic","MG126","75131","[email protected]"], 
       ["Dr Tom Lunney","Postgraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121D","75388","[email protected]"], 
       ["Dr Heather Sayers","Undergraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121C","75148","[email protected]"], 
       ["Dr Liam Mc Daid","Senior Lecturer","Academic","MS016","75452","[email protected]"], 
       ["Mr Derek Woods","Senior Lecturer","Academic","MS134","75380","[email protected]"], 
       ["Dr Ammar Belatreche","Lecturer","Academic","MS104","75185","[email protected]"], 
       ["Mr Michael Callaghan","Lecturer","Academic","MS132","75771","[email protected]"], 
       ["Dr Sonya Coleman","Lecturer","Academic","MS133","75030","[email protected]"], 
       ["Dr Joan Condell","Lecturer","Academic","MS131","75024","[email protected]"], 
       ["Dr Damien Coyle","Lecturer","Academic","MS103","75170","[email protected]"], 
       ["Mr Martin Doherty","Lecturer","Academic","MG121A","75552","[email protected]"], 
       ["Dr Jim Harkin","Lecturer","Academic","MS108","75128","[email protected]"], 
       ["Dr Yuhua Li","Lecturer","Academic","MS106","75528","[email protected]"], 
       ["Dr Sandra Moffett","Lecturer","Academic","MS015","75381","[email protected]"], 
       ["Mrs Mairin Nicell","Lecturer","Academic","MG127","75007","[email protected]"], 
       ["Mrs Maeve Paris","Lecturer","Academic","MG040","75212","[email protected]"], 
       ["Dr Jose Santos","Lecturer","Academic","MG035","75034","[email protected]"], 
       ["Dr NH. Siddique","Lecturer","Academic","MG037","75340","[email protected]"], 
       ["Dr Zumao Weng","Lecturer","Academic","MG050 ","75358","[email protected]"], 
       ["Dr Shane Wilson","Lecturer","Academic","MG038","75527","[email protected]"], 
       ["Dr Caitriona Carr","Technical Services Engineer","Computing and Technical Support","MG121B","75003","[email protected]"], 
       ["Mr Neil McDonnell","Technical Services Supervisor","Computing and Technical Support","MS030/MF143","75360", "[email protected]"], 
       ["Mr Paddy McDonough","Technical Services Engineer","Computing and Technical Support","MS034","75322","[email protected]"], 
       ["Mr Bernard McGarry","Network Assistant","Computing and Technical Support","MG132","75644","[email protected]"], 
       ["Mr Stephen Friel","Secretary","Clerical Staff","MG048","75148","[email protected]"], 
       ["Ms Emma McLaughlin","Secretary","Clerical Staff","MG048","75153","[email protected]"], 
       ["Mrs. Brenda Plummer","Secretary","Clerical Staff","MS126","75605","[email protected]"], 
       ["Miss Paula Sheerin","Secretary","Clerical Staff","MS111","75616","[email protected]"], 
       ["Mrs Michelle Stewart","Secretary","Clerical Staff","MG048","75382","[email protected]"]] 

search_result = [] 

search_input = input ("Please enter a search criterion: ") 
search_input = search_input.title() 

for person in staff_details: 
    for characteristic in person: 
if characteristic in person: 
    if search_input in characteristic: 
      search_result.append(person) 
       break 

if len(search_result) == 0: 
    print ("No staff members match your search criterion of ->", search_input) 


else: 
    print("We have a match!") 
    print ("{0:<30} {1:<40} {2:<40} {3:<50}".format("Position:", "Designation:", "Room and Extension:", "Name and Email:")) 
    print ("-" * 160) 

for align in search_result: 
    print("{0:<30} {1:<40} {2:<40} {3:<50}".format((align[1]), (align[2]), (align[3] + ", Ext:" + align[4]), align[0] + "(" + align[5] + ")"))

我希望这可以帮助你！

来源

2013-04-15 09:26:37 Becs1990

Python搜索数据集

回答

相关问题