我正在编写一个类RecurringInterval
,它基于dateutil.rrule对象 - 表示一个重复的时间间隔。我已经为它定义了一个自定义的,可读的__str__
方法,并且还想定义一个parse
方法(类似于rrulestr()函数)将字符串解析回对象。在Python中,如何解析表示一组关键字参数的字符串,以使命令无关紧要
这里是parse
方法和一些测试用例去用它:
import re
from dateutil.rrule import FREQNAMES
import pytest
class RecurringInterval(object):
freq_fmt = "{freq}"
start_fmt = "from {start}"
end_fmt = "till {end}"
byweekday_fmt = "by weekday {byweekday}"
bymonth_fmt = "by month {bymonth}"
@classmethod
def match_pattern(cls, string):
SPACES = r'\s*'
freq_names = [freq.lower() for freq in FREQNAMES] + [freq.title() for freq in FREQNAMES] # The frequencies may be either lowercase or start with a capital letter
FREQ_PATTERN = '(?P<freq>{})?'.format("|".join(freq_names))
# Start and end are required (their regular expressions match 1 repetition)
START_PATTERN = cls.start_fmt.format(start=SPACES + r'(?P<start>.+?)')
END_PATTERN = cls.end_fmt.format(end=SPACES + r'(?P<end>.+?)')
# The remaining tokens are optional (their regular expressions match 0 or 1 repetitions)
BYWEEKDAY_PATTERN = cls.optional(cls.byweekday_fmt.format(byweekday=SPACES + r'(?P<byweekday>.+?)'))
BYMONTH_PATTERN = cls.optional(cls.bymonth_fmt.format(bymonth=SPACES + r'(?P<bymonth>.+?)'))
PATTERN = SPACES + FREQ_PATTERN \
+ SPACES + START_PATTERN \
+ SPACES + END_PATTERN \
+ SPACES + BYWEEKDAY_PATTERN \
+ SPACES + BYMONTH_PATTERN \
+ SPACES + "$" # The character '$' is needed to make the non-greedy regular expressions parse till the end of the string
return re.match(PATTERN, string).groupdict()
@staticmethod
def optional(pattern):
'''Encloses the given regular expression in an optional group (i.e., one that matches 0 or 1 repetitions of the original regular expression).'''
return '({})?'.format(pattern)
'''Tests'''
def test_match_pattern_with_byweekday_and_bymonth():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by weekday Monday, Tuesday by month January, February"
groups = RecurringInterval.match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
assert groups['byweekday'].strip() == "Monday, Tuesday"
assert groups['bymonth'].strip() == "January, February"
def test_match_pattern_with_bymonth_and_byweekday():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by month January, February by weekday Monday, Tuesday "
groups = RecurringInterval.match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
assert groups['byweekday'].strip() == "Monday, Tuesday"
assert groups['bymonth'].strip() == "January, February"
if __name__ == "__main__":
# pytest.main([__file__])
pytest.main([__file__+"::test_match_pattern_with_byweekday_and_bymonth"]) # This passes
# pytest.main([__file__+"::test_match_pattern_with_bymonth_and_byweekday"]) # This fails
但如果你在“正确”的顺序指定参数解析器的作品,它是“死板”,因为它没有按不允许以任意顺序给出可选参数。这就是为什么第二次测试失败。
什么是使解析器以任何顺序解析“可选”字段的方法,这样两个测试都能通过? (我正在考虑使用正则表达式的所有排列组合来创建一个迭代器,并在每个排列上尝试re.match
,但这看起来不是一个优雅的解决方案)。
你可以减少你的代码吗?看起来像一个有趣的问题,但目前对我而言,这只是“代码墙”。 [mcve]会很好。 –
当然,原始代码片段几乎包含了所有的[dateutil.rrule](http://dateutil.readthedocs.io/en/stable/rrule.html)参数,但是我删除了那些未在测试中使用的参数,以减少行数。 –
我没有时间或见解来回答,但有一个upvote。 –