2017-03-01 101 views
1

我能做些什么来防止大熊猫从转换我的字符串值浮动。列Billing Doc.Sales Order含有要被存储在MySQL表具有CHAR的数据类型(15)的柱内数10-11位数字。当我执行下面的脚本时,我在每个数字的末尾看到.0。我想在数据库中将它们视为字符串/字符。 的Billing Doc. field包含3206790137, 3209056079, 3209763880, 3209763885, 3206790137谁是存储在DB作为3206790137.0, 3209056079.0, 3209763880.0, 3209763885.0, 3206790137.0号码。数据库中Billing doc的列数据类型为CHAR(15)大熊猫自动转换我的字符串列浮动

def insert_billing(df): 
     df = df.where((pd.notnull(df)), None) 
     for row in df.to_dict(orient="records"): 
      bill_item = row['Bill.Item'] 
      bill_qty = row['Billed Qty'] 
      bill_doct_date = row['Billi.Doc.Date'] 
      bill_doc = row['Billing Doc.'] 
      bill_net_value = row['Billi.Net Value'] 
      sales_order = row['Sales Order'] 
      import_date = DT.datetime.now().strftime('%Y-%m-%d') 


      query = "INSERT INTO sap_billing(" \ 
        "bill_item, " \ 
        "bill_qty, " \ 
        "bill_doc_date, " \ 
        "bill_doc, " \ 
        "bill_net_value, " \ 
        "sales_order, " \ 
        "import_date" \ 
        ") VALUES (" \ 
        "\"{}\", \"{}\", \"{}\", \"{}\"," \ 
        "\"{}\", \"{}\", \"{}\"" \ 
        ") ON DUPLICATE KEY UPDATE " \ 
        "bill_qty = VALUES(bill_qty), " \ 
        "bill_doc_date = VALUES(bill_doc_date), " \ 
        "bill_net_value = VALUES(bill_net_value), " \ 
        "import_date = VALUES(import_date) " \ 
        "".format(
         bill_item, 
         bill_qty, 
         bill_doct_date, 
         bill_doc, 
         bill_net_value, 
         sales_order, 
         import_date 
         ) 
      query = query.replace('\"None\"', 'NULL') 
      query = query.replace('(None', '(NULL') 
      query = query.replace('\"NaT\"', 'NULL') 
      query = query.replace('(NaT', '(NULL') 

      try: 
       q1 = gesdb_connection.execute(query) 
      except Exception as e: 
       print(bill_item, bill_doc, sales_order, e) 



    if __name__ == "__main__": 
     engine_str = 'mysql+mysqlconnector://root:[email protected]/mydb' 

     file_name = "tmp/dataload/so_tracking.XLSX" 
     df = pd.read_excel(file_name) 
     if df.shape[1] == 35 and compare_columns(list(df.columns.values)) == 1: 
      insert_billing(df) 
     else: 
      print("Incorrect column count, column order or column headers.\n") 

当我创建一个简单的df并打印它时,问题不显示。

import pandas as pd 
df = pd.DataFrame({'Sales Order': [1217252835, 1217988754, 1219068439], 
        'Billing Doc.': [3222102723, 3209781889, 3214305818]}) 
    >>> df 
    Billing Doc. Sales Order 
0 3222102723 1217252835 
1 3209781889 1217988754 
2 3214305818 1219068439 

但是,当我通过excel读取然后打印它时,该列读取为float64。

file_name = "tmp/dataload/so_tracking.XLSX" 
    df = pd.read_excel(file_name) 
    print(df['Billing Doc.']) 

680 3.252170e+09 
681 3.252170e+09 
682 3.252170e+09 
683 3.252170e+09 
684 3.252170e+09 
685 3.252170e+09 
686 3.252170e+09 
687 3.252170e+09 
688 3.252170e+09 
689 3.252170e+09 
690 3.252170e+09 
. 
. 
. 
694 3.251601e+09 
695 3.251631e+09 
696 3.252013e+09 
697    NaN 
698 3.252272e+09 
699 3.252360e+09 
700 3.252474e+09 
. 
. 
Name: Billing Doc., dtype: float64 
+2

你能不能提炼,这归因于重复的例子?没有其他人可以访问您的数据库或电子表格。所以任何帮助的尝试都只是猜测。 –

+0

熊猫较真可能不喜欢这种快速修复,但我用'pd.read_csv(“FILE.CSV”,D型细胞=对象)'和它保持大熊猫从数字转换为浮点数。我相当肯定你可以用其他DataFrame创建函数替换'read_csv()'。 – pshep123

+0

@PaulH我添加了一个示例。 – nomad

回答

0

我找到了解决办法我自己,张贴在这里来记录它。

df = pd.read_excel(file_name, converters={'Billing Doc.' : str}) 
print(df['Billing Doc.']) 

695 3251631331 
696 3252012614 
697   NaN 
698 3252272451 
699 3252359504 
700 3252473894 
701   NaN 
702   NaN 
703   NaN 
704 3252652940 
705   NaN 
706   NaN 
707   NaN 
708   NaN 
Name: Billing Doc., dtype: object 
-1

试试这个:

df = df.astype(str) 

注意,这是非常无效

或每个值转换为int之前将它们插入到查询