2017-04-04 54 views
0

我正在从谷歌存储与谷歌datalab读取文件,然后我有一个数据的变量,但我需要将其转换为熊猫数据框。谷歌存储对象熊猫数据框

我读:

%%gcs read --object $objeto1 --variable prueba 

变量prueba样子:

1/1/2016 08:35:56,1,4756798,"7501073831988",1.00,15.00,0.16,"S0394",4388,2,10.43\r\n1,1/1/2016 08:35:56,1,4756798,"850697002395",1.00,13.50,0.00,"S0394",4388,2,10.36\r\n1,1/1/2016 08:35:56,1,4756798,"850697002425",1.00,10.00,0.00,"S0394",4388,2,7.29\r\n1,1/1/2016 08:38:55,2,1013642,"8469760102003",1.00,200.00,0.16,"C0278",2595,1,161.20\r\n 

任何帮助吗?

+0

当我从阅读的BigQuery的查询,例如:DF = bq.Query(“选择塔布拉*”)to_dataframe( ),它足以将我的对象转换为熊猫数据框,但是当我在存储的变量中执行类似操作时:AttributeError:'str'对象没有属性'to_dataframe' –

+0

将您的变量包装在StringIO中,如下所示:https: //stackoverflow.com/questions/37990467/how-can-i-load-my-csv-from-google-datalab-to-a-pandas-data-frame – Tautvydas

回答

0

我建议你读GCS文件到您的datalabs机:

def (gcs_path, csv_file_name): 
    get_ipython().system(u'gsutil cp ' + path + csv_file_name+' .') 
    df = pd.read_csv(csv_file_name) 
    return df