python在excel中搜索俄语子串

2015-11-02 76 views 1 likes

我想读取excel文件并提取一些关于某些人的信息。python在excel中搜索俄语子串

下面是我在做什么

import xlrd 
dir = './schfiles'; 
files = os.listdir(dir); 
f = files[0]; 
book = xlrd.open_workbook(dir+"/"+files[0]); 
sh = book.sheet_by_index(0) 
t = sh.cell_value(rowx=xlr2i(35),colx=xlc2i('F')) 
t.find(u"Усманов")

写入变种t时的字符串为u'\ u0434 \ u043e \ u0446。 \ u0423 \ u0441 \ u043c \ u0430 \ u043d \ u043e \ u0432 \ u0411。\ u0428。'看起来像“доц。УсмановБ.Ш.”

U “Усманов” 被表示为u '\ XD3 \ XF1 \ XEC \ xe0 \固定的\ XEE \ XE2'

我试图两个串编码为 'UTF8'，解码它们，使用外部库，但没有帮助。

有谁知道怎么可能在这里找到一个特定的子串？

来源

2015-11-02 Pheu Verg

回答

使用# -*- coding: utf-8 -*-作为脚本的第一行来告诉intepreter您正在使用哪种编码。

# -*- coding: utf-8 -*- 

import os 
import xlrd 

dir = './schfiles' 
files = os.listdir(dir) 
f = files[0] 

workbook_path = os.path.join(dir, files[0]) 
book = xlrd.open_workbook(workbook_path) 

sh = book.sheet_by_index(0) 
t = sh.cell_value(rowx=xlr2i(35),colx=xlc2i('F')) 
t.find(u"Усманов")

来源

2015-11-02 18:37:59 dm295

那么，我怎么知道我应该使用什么编码？ –

对非ASCII字符使用'utf-8'，Python 2.x使用'ASCII'作为默认编码。 – dm295

@PheuVerg，明确地说，'#coding：utf8'声明了*源文件*的编码。 't.find（u'Усманов'）'应该是一个Unicode字符串，并确保将源文件保存在声明的编码中。然后，Python将知道如何正确构建Unicode字符串。您可以对源文件使用任何编码，以便正确表示您的语言，但声明的编码和实际保存的编码必须一致。 –

python在excel中搜索俄语子串

回答

相关问题