Tensorflow：使用输入管道（.csv）作为训练字典

我试图在.csv数据集（5008列，533行）上训练模型。我使用的TextReader将数据解析为两个张量，一个保存数据对[例如]和一个训练保持正确的标签[标签]：Tensorflow：使用输入管道（.csv）作为训练字典

def read_my_file_format(filename_queue): 
    reader = tf.TextLineReader() 
    key, record_string = reader.read(filename_queue) 
    record_defaults = [[0.5] for row in range(5008)] 

    #Left out most of the columns for obvious reasons 
    col1, col2, col3, ..., col5008 = tf.decode_csv(record_string, record_defaults=record_defaults) 
    example = tf.stack([col1, col2, col3, ..., col5007]) 
    label = col5008 
    return example, label 

def input_pipeline(filenames, batch_size, num_epochs=None): 
    filename_queue = tf.train.string_input_producer(filenames, num_epochs=num_epochs, shuffle=True) 
    example, label = read_my_file_format(filename_queue) 
    min_after_dequeue = 10000 
    capacity = min_after_dequeue + 3 * batch_size 
    example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue) 
    return example_batch, label_batch

执行的东西时，这部分工作，像：

with tf.Session() as sess: 
    ex_b, l_b = input_pipeline(["Tensorflow_vectors.csv"], 10, 1) 
    print("Test: ",ex_b)

我的结果是Test: Tensor("shuffle_batch:0", shape=(10, 5007), dtype=float32)

到目前为止，这似乎没什么问题。接下来，我创建了一个简单的模型，其中包含两个隐藏层（分别为512和256个节点）。

batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size) 
_, cost = sess.run([optimizer, cost], feed_dict={x: batch_x.eval(), y: batch_y.eval()})

我基于this example that uses the MNIST database这种方法：当事情出错时，我试图培养模式。但是，当我执行此操作时，即使当我仅使用batch_size = 1时，Tensorflow也会挂起。如果我离开了.eval()职能应该从张量获取的实际数据，我得到如下回应：

TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

现在，这个我能理解，但我不明白为什么程序挂起时，我不包括.eval()函数，我不知道我在哪里可以找到有关此问题的任何信息。

编辑：我包括我的整个脚本here的最新版本。该程序仍然挂起，即使我实施了（据我所知）提供的解决方案vijay m

来源

2017-07-01 Voidling

请问您可以添加整个代码吗？ –

整个代码可以在这里找到：[链接]（https://github.com/Voidling0/TFCSV2/blob/master/script.py） – Voidling

由于错误说，你试图喂张量到feed_dict。你已经定义了一个input_pipeline队列，你不能通过feed_dict。数据传递到模型和火车的正确方法显示在下面的代码中：

# A queue which will return batches of inputs 
batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size) 

# Feed it to your neural network model: 
# Every time this is called, it will pull data from the queue. 
logits = neural_network(batch_x, batch_y, ...) 

# Define cost and optimizer 
cost = ... 
optimizer = ... 

# Evaluate the graph on a session: 
with tf.Session() as sess: 
    init_op = ... 
    sess.run(init_op) 

    # Start the queues 
    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(sess=sess, coord=coord) 

    # Loop through data and train 
    for (loop through steps): 
     _, cost = sess.run([optimizer, cost]) 

    coord.request_stop() 
    coord.join(threads)

来源

2017-07-01 16:13:04

我非常感谢你的帮助！在做了一个额外的必要的修改之后，因为我的向量中的维度不相同（我通过使用一个整形函数'batch_y = tf.reshape（batch_y，[12,1]）'解决了这个问题），我仍然处于亏损状态因为程序再次挂起。如果你愿意看一下这里的链接到我的整个代码：[link]（https://github.com/Voidling0/TFCSV2/blob/master/scriptv2.py）。我认为这也可能对其他人进入Tensorflow很有帮助，因为有时很难确定程序为什么挂起。 – Voidling

注意：在行** 118 **之后它会挂起，以确保准确。顺便说一下在980Ti上运行，所以我期望硬件不成为这个问题的原因。 – Voidling

你能分享输入'csv'吗？ –

Tensorflow：使用输入管道（.csv）作为训练字典

回答

相关问题