2016-06-01 186 views

回答

2

对于较新的格式,它们通常只是压缩xml,因此您可以使用标准库来解压缩和解析xml。获取文档创建者的一些代码先前是posted as an answer on stackoverflow

import zipfile, lxml.etree 

# open zipfile 
zf = zipfile.ZipFile('my_doc.docx') 
# use lxml to parse the xml file we are interested in 
doc = lxml.etree.fromstring(zf.read('docProps/core.xml')) 
# retrieve creator 
ns={'dc': 'http://purl.org/dc/elements/1.1/'} 
creator = doc.xpath('//dc:creator', namespaces=ns)[0].text 

对于较旧的格式,你可能想看看hachoir-metadata library