2011-09-28 107 views
2

我有一个python文件,它在ms sql中创建并填充表格。唯一的问题在于,如果有非ASCII字符或单撇号(每个字符有很多),代码就会中断。虽然我可以运行替换函数来消除撇号字符串,但我更愿意保持它们的完整。我也尝试将数据转换为utf-8,但也没有运气。Python非ASCII字符

下面是个错误信息,我得到:

"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters) 

和单引号

class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty.... 

当我尝试在UTF-8编码字符串,我反而得到以下错误信息:

<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128) 

python代码包含在下面。我相信代码中发生此中断的点在以下行之后:InsertValue = str(row.GetValue(CurrentField ['Name']))。

# -*- coding: utf-8 -*- 

import pyodbc 
import sys 
import arcpy 
import arcgisscripting 

gp = arcgisscripting.create(9.3) 
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE'] 

#SourceFGDB = '###' 
#SourceTable = '###' 
SourceTable = sys.argv[1] 
TempInputName = sys.argv[2] 
SourceTable2 = sys.argv[3] 
#--------------------------------------------------------------------------------------------------------------------- 
# Target Database Settings 
#--------------------------------------------------------------------------------------------------------------------- 
TargetDatabaseDriver = "{SQL Server}" 
TargetDatabaseServer = "###" 
TargetDatabaseName = "###" 
TargetDatabaseUser = "###" 
TargetDatabasePassword = "###" 

# Get schema from FGDB table. 
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}] 

if not gp.Exists(SourceTable): 
    print ('- The source does not exist.') 
    sys.exit(102) 
#### Should see if it is actually a table type. Could be a Feature Data Set or something... 
print('  - Processing Items From : ' + SourceTable) 
FieldList = [] 
Field_List = gp.ListFields(SourceTable) 
print('   - Getting number of rows.') 
result = gp.GetCount_management(SourceTable) 
Number_of_Features = gp.GetCount_management(SourceTable) 
print('    - Number of Rows: ' + str(Number_of_Features)) 
print('   - Getting fields.') 
Field_List1 = gp.ListFields(SourceTable, 'Layer') 
Field_List2 = gp.ListFields(SourceTable, 'Comments') 
Field_List3 = gp.ListFields(SourceTable, 'Category') 
Field_List4 = gp.ListFields(SourceTable, 'State') 
Field_List5 = gp.ListFields(SourceTable, 'Label') 
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate') 
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID') 
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7: 
     print('   - Field Found: ' + Current_Field.Name) 
     if Current_Field.AliasName in SQL_KEYWORDS: 
      Target_Name = Current_Field.Name + '_' 
     else: 
      Target_Name = Current_Field.Name 

     print('     - Alias : ' + Current_Field.AliasName) 
     print('     - Type  : ' + Current_Field.Type) 
     print('     - Length : ' + str(Current_Field.Length)) 
     print('     - Scale : ' + str(Current_Field.Scale)) 
     print('     - Precision: ' + str(Current_Field.Precision)) 
     FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name}) 
# Create table in SQL Server based on FGDB table schema. 
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###') 
cursor = cnxn .cursor() 
#### DROP the table first? 
try: 
    DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;' 
    print DropTableSQL 
    cursor.execute(DropTableSQL) 
    dbconnection.commit() 
except: 
    print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test') 
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test ' 
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)') 
cursor.execute(CreateTableSQL) 
cnxn.commit() 
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table. 
# We got Number_of_Features earlier, just use that. 
Number_Processed = 0 
print('  - Processing ' + str(Number_of_Features) + ' features.') 
rows = gp.SearchCursor(SourceTable) 
row = rows.Next() 
while row: 
    if Number_Processed % 10000 == 0: 
     print('   - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features)) 
    InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test (' 
    InsertSQLValues = 'VALUES (' 
    for CurrentField in FieldList: 
     InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', ' 
     InsertValue = str(row.GetValue(CurrentField['Name'])) 
     if InsertValue in ['None']: 
      InsertValue = 'NULL' 
     # Use an escape quote for the SQL. 
     InsertValue = InsertValue.replace("'","' '") 
     if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']: 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + "NULL, " 
      else: 
       InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', " 
     elif CurrentField['Type'].upper() in ['GEOMETRY']: 
      ## We're not handling geometry transfers at this time. 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + '0' + ', ' 
      else: 
       InsertSQLValues = InsertSQLValues + '1' + ', ' 
     else: 
      InsertSQLValues = InsertSQLValues + InsertValue + ', ' 
    InsertSQLFields = InsertSQLFields[:-2] + ')' 
    InsertSQLValues = InsertSQLValues[:-2] + ')' 
    InsertSQL = InsertSQLFields + ' ' + InsertSQLValues 
    ## print InsertSQL 
    cursor.execute(InsertSQL) 
    cnxn.commit() 
    Number_Processed = Number_Processed + 1 
    row = rows.Next() 
print('   - Processed all ' + str(Number_Processed)) 
del row 
del rows 
+0

它是如何突破的?哪里? – Dave

+0

它通常会在此处中断:InsertValue = str(row.GetValue(CurrentField ['Name']))。它会填充它创建的sql表,直到找到一个非ascii字符或一个单撇号,然后它会在那里出错。 –

+0

和你有什么例外,你可以编辑你的问题来添加它吗? – Dave

回答

1

我会用我的心理调试技能,说你试图str() IFY东西,并得到一个错误与ASCII编码解码器。你真正应该做的是使用UTF-8编码解码器,而不是像这样:

insert_value_uni = unicode(row.GetValue(CurrentField['Name'])) 
InsertValue = insert_value_uni.encode('utf-8') 
+0

当我尝试使用utf-8编码时遇到了另一个错误。 'ascii'编码解码器无法解码位置219中的字节0xe2:序号不在范围内(128) –

+0

@JamesD,您能否将整个回溯置于您的问题中?确保将其缩进来保留格式。 – Dave

+0

会做。谢谢! –

3

詹姆斯,我相信真正的问题在于你没有使用统一码。尝试执行以下操作:

  • 确保您用来填充数据库的输入文件是UTF-8,并且您正在使用UTF-8编码器读取它。
  • 确保您的数据库实际上是将数据存储为Unicode
  • 当您从文件或数据库中检索数据或想要操纵字符串时(例如使用+运算符),您需要确保所有部件都是适当的Unicode。你不能使用str()方法。你需要使用unicode(),正如Dave指出的那样。如果你在你的代码中定义了字符串,使用u'my字符串'而不是'我的字符串'(否则它不会被认为是unicode)。

此外,请提供完整的堆栈跟踪和例外名称。

0

一般而言,您希望将数据输入转换为unicode,并将其转换为输出中所需的编码。

因此,如果您这样做会更容易找到您的问题。这意味着将所有字符串更改为unicode,'INSERT INTO'更改为u'INSERT INTO'。 (注意字符串前面的“u”) 然后当你发送要执行的字符串转换为所需的编码“utf8”。

cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode 

此外,您应该将编码字符串添加到您的源代码的顶部。 这意味着增加了编码的cookie文件的前两行之一:

 #!/usr/bin/python 
    # -*- coding: <encoding name> -*- 

如果从一个文件中建立自己的字符串您可以将数据提取使用codecs.open自动从一个特定的编码为Unicode的转换加载。

0

当我将我的str()转换为unicode时,解决了这个问题。一个简单的答案,我感谢大家在这方面的帮助。