日期:2014-05-20  浏览次数:20683 次

在J2SE中使用正则表达式的一个细节

今天在做字符串处理的时候,

发现了在J2SE中使用正则表达式来匹配英文句号的一个小陷阱。

特此记录下来。

?

?

public static void main(String[] args) {
    String test = "aa\\abb.gg\\ad.txt";
    String regEx = ".+\\\\(.+)\\..+$";
    Pattern p = Pattern.compile(regEx);
    Matcher m = p.matcher(test);
    boolean rs = m.find();    // 用来验证时候有匹配项
    for (int i = 1; i <= m.groupCount(); i++) {
        System.out.println(m.group(i));
    }
}

?

关键的部分在这一句:

?

String regEx = ".+\\\\(.+)\\..+$";

?

在正则表达式中,我们都知道要匹配特殊字符需要用转义符 “ \ ” ?来转移该特殊字符,

所以我相当然的以为跟其他语言一样,

?" \. " 可以匹配英文的句号 " . "

但是在J2SE中,却得到编译器报错的错误。

?

查阅相关资料,java中的dot符号(.)需要用“ \\. ”来转义。

因为Java编译器会把 "\." 当做是对它的字符串对象的转义。

?

原文如下:

It's actually very simple. You need to end up with the escape sequence \. for the regexp to the valid, but it doesn't work because the Java compiler sees it as an escape for its String objects, while a full-stop here does not require escaping.
The work-around is to write "\\.". This way, the backslash is escaped on the first round (remember that a backslash must be escaped anyway), and on the second round (when involving regex), it is the full-stop which is escaped...
?

?

?

========================全文完======================