Python读取CSV并将值存入MySQL数据库

我试图从csv获取值并将它们放入数据库中，我正在设法做到这一点，而没有很大的麻烦。Python读取CSV并将值存入MySQL数据库

但我知道需要回写到csv，所以下一次运行脚本时，它只会从csv文件中的标记下方将值输入到数据库中。

注意系统上的CSV文件会自动刷新每24小时，因此请记住csv中可能没有标记。所以如果没有标记被发现，基本上把所有的值都放在数据库中。

我打算每30分钟运行一次这个脚本，因此csv文件中可能会有48个标记，甚至可以删除标记并每次将它移下文件？

我一直在删除该文件，然后重新在脚本中创建一个文件，以便每个脚本都运行新文件，但这会打破系统的某种程度，因此这不是一个好选择。

希望你们能帮助..

谢谢

Python代码：

import csv 
import MySQLdb 

mydb = MySQLdb.connect(host='localhost', 
user='root', 
passwd='******', 
db='kestrel_keep') 

cursor = mydb.cursor() 

csv_data = csv.reader(file('data_csv.log')) 

for row in csv_data: 

    cursor.execute('INSERT INTO `heating` VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', 
    row) 
#close the connection to the database. 
mydb.commit() 
cursor.close() 
import os 


print "Done"

我的CSV文件格式：

2013-02-21,21:42:00,-1.0,45.8,27.6,17.3,14.1,22.3,21.1,1,1,2,2 
2013-02-21,21:48:00,-1.0,45.8,27.5,17.3,13.9,22.3,20.9,1,1,2,2

来源

2013-02-25 ZeroG

我认为这不是一个更好的选择“标记”CSV文件是为了保存文件，你是否存储了你处理的最后一行的编号。

因此，如果文件不存在（一个是存储最后处理的行的编号），则会处理整个CSV文件。如果此文件存在，则仅处理此行后的记录。

终极密码在工作系统：

#!/usr/bin/python 
import csv 
import MySQLdb 
import os 

mydb = MySQLdb.connect(host='localhost', 
user='root', 
passwd='*******', 
db='kestrel_keep') 

cursor = mydb.cursor() 

csv_data = csv.reader(file('data_csv.log')) 

start_row = 0 

def getSize(fileobject): 
fileobject.seek(0,2) # move the cursor to the end of the file 
size = fileobject.tell() 
return size 

file = open('data_csv.log', 'rb') 
curr_file_size = getSize(file) 

# Get the last file Size 
if os.path.exists("file_size"): 
with open("file_size") as f: 
    saved_file_size = int(f.read()) 


# Get the last processed line 
if os.path.exists("lastline"): 
with open("lastline") as f: 
    start_row = int(f.read()) 


if curr_file_size < saved_file_size: start_row = 0 

cur_row = 0 
for row in csv_data: 
if cur_row >= start_row: 
    cursor.execute('INSERT INTO `heating` VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s ,%s)', row) 

    # Other processing if necessary 

cur_row += 1 

mydb.commit() 
cursor.close() 


# Store the last processed line 
with open("lastline", 'w') as f: 
start_line = f.write(str(cur_row + 1)) # you want to start at the **next** line 
             # next time 
# Store Current File Size To Find File Flush  
with open("file_size", 'w') as f: 
start_line = f.write(str(curr_file_size)) 

# not necessary but good for debug 
print (str(cur_row)) 



print "Done"

编辑：终极密码由ZeroG提供Submited现在工作在系统上！谢谢你也是太Xion345帮助

来源

2013-02-25 10:15:09 Xion345

我喜欢这个答案，但我不能得到行我们正在放入0以上的最后一行文件甚至'（str（cur_row））'reviles 0 ...也记住当文件在00:01刷新： 00行号不会相对于新的csv文件，所以我想我们需要检查某处的时间 – ZeroG 2013-02-25 16:48:26

是的，你说得对，代码错了，你需要在结尾处移动'cur_row + = 1'语句for循环。至于00:01的刷新，你需要检查当前时间和最后一行文件的写入日期。 – Xion345 2013-02-25 17:14:34

@ZeroG：检测文件是否已被刷新的更好的办法是将CSV文件的大小存储在最后一行文件中（除了最后一个处理过的行）。如果文件大小在脚本的两次后续执行之间减少，则知道CSV文件已被刷新。 – Xion345 2013-02-25 17:21:50

每个csv行似乎都包含一个时间戳。如果这些数据总是增加，则可以查询数据库以获取已记录的最大时间戳，并在读取csv之前跳过所有行。

来源

2013-02-25 10:18:55

它看起来像你的MySQL表中的第一个字段是唯一的时间戳。可以设置MySQL表，使该字段必须是唯一的，并忽略违反该唯一性属性的INSERT。在mysql>提示符下输入命令：

ALTER IGNORE TABLE heating ADD UNIQUE heatingidx (thedate, thetime)

（更改thedate和thetime持有的日期和时间列的名称。）

一旦你做出这个变化到你的数据库，你只需要改变一行在你的程序，以使MySQL忽略重复插入：

cursor.execute('INSERT IGNORE INTO `heating` VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', row)

是的，这是一个小浪费了INSERT IGNORE ...已经处理过的线路上，但考虑到你的数据频率（每6分钟？），它在性能方面不会有太大影响。

这样做的好处是现在不可能无意中将重复项插入到表中。它还使程序的逻辑简单易读。

它还避免了两个程序同时写入同一个CSV文件。即使您的程序通常成功没有错误，每隔一段时间 - 也许一次在蓝色月亮中 - 您的程序和其他程序可能会尝试同时写入文件，这可能会导致错误或损坏数据。

您也可以使你的程序更快一点用cursor.executemany代替cursor.execute：

rows = list(csv_data) 
cursor.executemany('''INSERT IGNORE INTO `heating` VALUES 
    (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)''', rows)

相当于

for row in csv_data:  
    cursor.execute('INSERT INTO `heating` VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', 
    row)

除了它所有的数据包到一个命令。

来源

2013-02-25 10:23:26 unutbu

我喜欢那个，但日期和时间是两个单独的字段??? – ZeroG 2013-02-25 16:49:00

@ZeroG：没问题。只需列出定义唯一行所需的所有字段。我已经编辑了上面的帖子来展示我的意思。 – unutbu 2013-02-25 19:03:12

这是否考虑到日期和时间需要不同，即在2天内有2个14:00，即使日期会不同？ – ZeroG 2013-02-25 20:35:11

Python读取CSV并将值存入MySQL数据库

回答

相关问题