多小时左右,我意识到令我惊恐地的docx文件戳后,答案在于文档的style.xml文件中。这里有一种方法来解决它的人有类似问题:
问题与文字方向:
- 如果你曾经在阿拉伯语或波斯语键入你可能已经看到,对齐文本右到左边并没有解决你所有的问题。因为如果你不改变文本方向,那么光标和标点符号就会保留在屏幕的最右边(而不是跟在最后一个字母后面),并且如果你需要的话,没有右对齐。现在,因为即使通过将document.xml中的“textDirection”值从'lrTb'(左 - 右/上 - 下)改为'rlTb',我也无法更改python-docx中的文本方向,所以我必须使用LibreOffice创建文档并将其默认段落样式('Normal')更改为我的想法(rtl文本方向等)。这实际上也节省了很多时间,因为你不需要在python中完成它。字体变化问题
的Xml解释:
与改变的默认样式文件显示其style.xml文件几个不同的事情。 在“w:rPr”下的正常段落样式中,您可以看到有一个额外的“w:szCs”,它决定了复杂脚本字体的大小(不能通过更改style.font.size来更改)和“ w:rFonts“”cs“的值现在是我指定的波斯语字体。此外,“w:lang”值“bidi”现在是“fa-IR”(波斯语)。下面是我在谈论的XML部分:
<w:rPr>
<w:rFonts w:ascii="FreeMono" w:hAnsi="FreeMono" w:cs="FreeFarsi"/>
<w:sz w:val="40"/>
<w:rtl/>
<w:cs/>
<w:szCs w:val="40"/>
<w:lang w:val="en-Us" w:bidi="fa-IR"/>
</w:rPr>
现在改变style.font.size只改变“SZ”值(西文字体大小),并没有做任何事情来“szCs”值(CS字体大小)。类似的style.font.name只会改变“w:rFonts”的“ascii”和“hAnsi”值,并且不会对“cs”值做任何事情。所以要改变这些值,我不得不改变我的python样式元素。
解决方案:
from docx import Document
from docx.shared import Pt
#path to doc with altered style:
base_doc_location = 'base.docx'
doc = Document(base_doc_location)
my_style = doc.styles['Normal']
# define your desired fonts
user_cs_font_size = 16
user_cs_font_name = 'FreeFarsi'
user_en_font_size = 12
user_en_font_name = 'FreeMono'
# get <w:rPr> element of this style
rpr = my_style.element.rPr
#==================================================
'''This probably isn't necessary if you already
have a document with altered style, but just to be
safe I'm going to add this here'''
if rpr.rFonts is None:
rpr._add_rFonts()
if rpr.sz is None:
rpr._add_sz()
#==================================================
'''Get the nsmap string for rpr. This is that "w:"
at the start of elements and element values in xml.
Like these:
<w:rPr>
<w:rFonts>
w:val
The nsmap is like a url:
http://schemas.openxmlformats.org/...
Now w:rPr translates to:
{nsmap url string}rPr
So I made the w_nsmap string like this:'''
w_nsmap = '{'+rpr.nsmap['w']+'}'
#==================================================
'''Because I didn't find any better ways to get an
element based on its tag here's a not so great way
of getting it:
'''
szCs = None
lang = None
for element in rpr:
if element.tag == w_nsmap + 'szCs':
szCs = element
elif element.tag == w_nsmap + 'lang':
lang = element
'''if there is a szCs and lang element in your style
those variables will be assigned to it, and if not
we make those elements and add them to rpr'''
if szCs is None:
szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
if lang is None:
lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
rpr.append(szCs)
rpr.append(lang)
#==================================================
'''Now to set our desired values to these elements
we have to get attrib dictionary of these elements
and set the name of value as key and our value as
value for that dict'''
szCs_attrib = szCs.attrib
lang_attrib = lang.attrib
rFonts_atr = rpr.rFonts.attrib
'''sz and szCs values are string values and 2 times
the font size so if you want font size to be 11 you
have to set sz (for western fonts) or szCs (for CTL
fonts) to "22" '''
szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
'''Now to change cs font and bidi lang values'''
rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
lang_attrib[w_nsmap+'bidi'] = 'fa-IR' # For Persian
#==================================================
'''Because we changed default style we don't even
need to set style every time we add a new paragraph
And if you change font name or size the normal way
it won't change these cs values so you can have a
font for CTL language and a different font for
western language
'''
persian_p = doc.add_paragraph('نوشته')
en_font = my_style.font
en_font.name = user_en_font_name
en_font.size = Pt(user_en_font_size)
english_p = doc.add_paragraph('some text')
doc.save('ex.docx')
编辑(代码改进):
我评论说,可以使用一些改进,把更好的线条它们下面的线。
#rpr = my_style.element.rPr # If None it'll throw errors later
rpr = my_style.element.get_or_add_rPr() # this avoids potential errors
#if rpr.rFonts is None:
# rpr._add_rFonts()
rFonts = rpr.get_or_add_rFonts()
#if rpr.sz is None:
# rpr._add_sz()
rpr.get_or_add_sz()
#by importing these you can make elements and set values quicker
from docx.oxml.shared import OxmlElement, qn
#szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
szCs = OxmlElement('w:szCs')
#lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
lang = OxmlElement('w:lang')
#szCs_attrib = szCs.attrib
#lang_attrib = lang.attrib
#rFonts_atr = rpr.rFonts.attrib
#szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
#rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
#lang_attrib[w_nsmap+'bidi'] = 'fa-IR'
szCs.set(qn('w:val'),str(int(user_cs_font_size*2)))
lang.set(qn('w:bidi'),'fa-IR')
rFonts.set(qn('w:cs'),user_cs_font_name)