因此,我为学校创建了一个正则表达式解析器,该解析器创建负责匹配的对象的层次结构。我决定做到面向对象,因为我更容易想象这种语法的实现。所以,这些是我组成正则表达式的类。这些都是用Java编写的,但是我认为如果你精通任何面向对象的语言,你可以继续学习。我们需要实现的唯一操作符是Union(+),Kleene-Star(*),表达式的连接(ab或者可能是(a + b)c),当然括括号中的例子。这就是我现在实施的方式,而且我的工作方式就像在主体中有一点额外开销的魅力一样。使用多态对正则表达式解析器建模
父类,Regexp.java
public abstract class Regexp {
//Print out the regular expression it's holding
//Used for debugging purposes
abstract public void print();
//Checks if the string matches the expression it's holding
abstract public Boolean match(String text);
//Adds a regular expression to be operated upon by the operators
abstract public void add(Regexp regexp);
/*
*To help the main with the overhead to help it decide which regexp will
*hold the other
*/
abstract public Boolean isEmpty();
}
有最简单的正则表达式,Base.java,持有一个字符,如果字符串的字符匹配返回true。
public class Base extends Regexp{
char c;
public Base(char c){
this.c = c;
}
public Base(){
c = null;
}
@Override
public void print() {
System.out.println(c);
}
//If the string is the char, return true
@Override
public Boolean match(String text) {
if(text.length() > 1) return false;
return text.startsWith(""+c);
}
//Not utilized, since base is only contained and cannot contain
@Override
public void add(Regexp regexp) {
}
@Override
public Boolean isEmpty() {
return c == null;
}
}
一个圆括号Paren.java,在里面放置一个正则表达式。这里没有什么特别的想法,但说明了匹配的工作原理。
public class Paren extends Regexp{
//Member variables: What it's holding and if it's holding something
private Regexp regexp;
Boolean empty;
//Parenthesis starts out empty
public Paren(){
empty = true;
}
//Unless you create it with something to hold
public Paren(Regexp regexp){
this.regexp = regexp;
empty = false;
}
//Print out what it's holding
@Override
public void print() {
regexp.print();
}
//Real simple; either what you're holding matches the string or it doesn't
@Override
public Boolean match(String text) {
return regexp.match(text);
}
//Pass something for it to hold, then it's not empty
@Override
public void add(Regexp regexp) {
this.regexp = regexp;
empty = false;
}
//Return if it's holding something
@Override
public Boolean isEmpty() {
return empty;
}
}
Union.java,它是两个可匹配的正则表达式。如果其中一个匹配,整个联盟就是一场比赛。
public class Union extends Regexp{
//Members
Regexp lhs;
Regexp rhs;
//Indicating if there's room to push more stuff in
private Boolean lhsEmpty;
private Boolean rhsEmpty;
public Union(){
lhsEmpty = true;
rhsEmpty = true;
}
//Can start out with something on the left side
public Union(Regexp lhs){
this.lhs = lhs;
lhsEmpty = false;
rhsEmpty = true;
}
//Or with both members set
public Union(Regexp lhs, Regexp rhs) {
this.lhs = lhs;
this.rhs = rhs;
lhsEmpty = false;
rhsEmpty = false;
}
//Some stuff to help me see the unions format when I'm debugging
@Override
public void print() {
System.out.println("(");
lhs.print();
System.out.println("union");
rhs.print();
System.out.println(")");
}
//If the string matches the left side or right side, it's a match
@Override
public Boolean match(String text) {
if(lhs.match(text) || rhs.match(text)) return true;
return false;
}
/*
*If the left side is not set, add the member there first
*If not, and right side is empty, add the member there
*If they're both full, merge it with the right side
*(This is a consequence of left-to-right parsing)
*/
@Override
public void add(Regexp regexp) {
if(lhsEmpty){
lhs = regexp;
lhsEmpty = false;
}else if(rhsEmpty){
rhs = regexp;
rhsEmpty = false;
}else{
rhs.add(regexp);
}
}
//If it's not full, it's empty
@Override
public Boolean isEmpty() {
return (lhsEmpty || rhsEmpty);
}
}
甲级联,Concat.java,这基本上是链接在一起正则表达式的清单。这个很复杂。
public class Concat extends Regexp{
/*
*The list of regexps is called product and the
*regexps inside called factors
*/
List<Regexp> product;
public Concat(){
product = new ArrayList<Regexp>();
}
public Concat(Regexp regexp){
product = new ArrayList<Regexp>();
pushRegexp(regexp);
}
public Concat(List<Regexp> product) {
this.product = product;
}
//Adding a new regexp pushes it into the list
public void pushRegexp(Regexp regexp){
product.add(regexp);
}
//Loops over and prints them
@Override
public void print() {
for(Regexp factor: product){
factor.print();
}
}
/*
*Builds up a substring approaching the input string.
*When it matches, it builds another substring from where it
*stopped. If the entire string has been pushed, it checks if
*there's an equal amount of matches and factors.
*/
@Override
public Boolean match(String text) {
ArrayList<Boolean> bools = new ArrayList<Boolean>();
int start = 0;
ListIterator<Regexp> itr = product.listIterator();
Regexp factor = itr.next();
for(int i = 0; i <= text.length(); i++){
String test = text.substring(start, i);
if(factor.match(test)){
start = i;
bools.add(true);
if(itr.hasNext())
factor = itr.next();
}
}
return (allTrue(bools) && (start == text.length()));
}
private Boolean allTrue(List<Boolean> bools){
return product.size() == bools.size();
}
@Override
public void add(Regexp regexp) {
pushRegexp(regexp);
}
@Override
public Boolean isEmpty() {
return product.isEmpty();
}
}
再一次,我已经得到了这些工作让我满意我的开销,标记化和所有那些好东西。现在我想介绍一下Kleene-star的操作。它匹配文本中出现的任何数字,甚至是0。所以,ba *会匹配b,ba,baa,baaa等,而(ba)*会匹配ba,baba,bababa等。它甚至看起来可能扩展我的正则表达式到这个,或者你看到解决这个问题的另一种方式? PS:有一些getters,setter和其他所有我没有写出来的支持函数,但这主要是为了让你快速了解这些类如何工作。
读者:Russ Cox的散文很漂亮。 – 2015-02-08 18:44:28
谢谢!这非常有帮助! – pimmen 2015-02-08 19:30:45
我已经通过使用你的方法工作,但它非常缓慢并且随着每一层呈指数级增长(例如,a(b(a + b))有两层括号),并且我已将它交给并通过了它,但如果这是一个分级任务,我不会使用这个非常慢的算法。五分钟需要一分钟,七分钟需要半小时。我会给这篇文章一看,也许会实现一个更好的。 – pimmen 2015-02-10 13:34:04