段落我已这个文本,我已经从利用iText一个pdf提取并放置到字符串变量:正则表达式,从网页中提取以下
(1) A a, — al'-fah; of Hebrew origin; the first letter of the alphabet;
figurative only (from its use as a numeral) the first: — Alpha.
Often used (usually ajn an, before a vowel) also in composition
(as a contraction from (427) (a]neu,)) in the sense of privation;
so in many words beginning with this letter; occasionally in the
sense of union (as a contraction of (260) (a[ma)).
(2) ÆAarw>n, — ah-ar-ohn'; of Hebrew origin [Hebrew {175}
('Aharown)]; Aaron, the brother of Moses: — Aaron.
(3) ÆAbaddw>n, — ab-ad-dohn'; of Hebrew origin [Hebrew {11}
('abaddown)]; a destroying angel: — Abaddon.
(4) ajbarh>v, — ab-ar-ace'; from (1) (a) (as a negative particle) and (922)
(ba>rov); weightless, i.e. (figurative) not burdensome: — from
being burdensome.
(5) ÆAbba~, — ab-bah'; of Chaldee origin [Hebrew {2} ('ab (Chaldee))];
father (as a vocative): — Abba.
(6) &Abel, — ab'-el; of Hebrew origin [Hebrew {1893} (Hebel)]; Abel,
the son of Adam: — Abel.
(7) ÆAbia>, — ab-ee-ah'; of Hebrew origin [Hebrew {29} ('Abiyah)];
Abijah, the name of two Israelites: — Abia.
(8) ÆAbia>qar, — ab-ee-ath'-ar; of Hebrew origin [Hebrew {54}
('Ebyathar)]; Abiathar, an Israelite: — Abiathar.
(9) ÆAbilhnh>, — ab-ee-lay-nay'; of foreign origin [compare Hebrew {58}
('abel)]; Abilene, a region of Syria: — Abilene.
(10) ÆAbiou>d, — ab-ee-ood'; of Hebrew origin [Hebrew {31}
('Abiyhuwd)]; Abihud, an Israelite: — Abiud.
字符串中的各段与([0-9])
开始如(9)
或(5)
,我想用pagestring.split("regex")
提取以此字符序列开头的每个段落。可以帮助吗?
太棒了!有没有一个教程或指南,你可以推荐,因为正则表达式真的把我搞砸了? – Lema 2015-03-13 09:01:14
我以前学过正则表达式,所以我不能真正推荐一个教程。但http://regexcrossword.com/提供了一种有趣的学习方式。 – laune 2015-03-13 09:21:59