劈裂正则表达式

的基础上，一个字符串我有以下的文本文件劈裂正则表达式

"Zanesville,OH"  +39.93830  -82.00830  84ZC PMNQ 
"Zaragoza,Spain"  +41.66670   -1.05000  GWC7 PXB0 
"Zurich,Switzerland"  +47.36670   +8.53330  HP9Z QVT0 
"Zwickau,Germany"  +50.70000  +12.50000  J17H RFH0

现在我想在每行的值。值之间有很多空格。我知道正则表达式可以用来获取值。但我无法做到。我正在使用读取文件的代码是这样的

File file = new File("C:\\Users\\user\\Desktop\\files\\cities.txt"); 
      if (file.exists()) { 
       BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file))); 
       String line = ""; 
       while ((line = br.readLine())!= null) { 
        String token[] =line.split(" "); 

       } 
      }

谁能告诉我如何获取值？

来源

2014-09-22 Usman Riaz

从文本文件中处理完数据后，您希望做的任何事情的预期输出是什么？ – Unihedron 2014-09-22 13:23:51

你的价值观是什么意思？ – 2014-09-22 13:24:01

我想从上述文件中创建一个csv文件 – 2014-09-22 13:25:33

根据以下正则表达式只是拆分输入，

\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

代码：

String s = "\"Zanesville,OH\"  +39.93830  -82.00830  84ZC PMNQ\n" + 
     "\"Zaragoza,Spain\"  +41.66670   -1.05000  GWC7 PXB0\n" + 
     "\"Zurich,Switzerland\"  +47.36670   +8.53330  HP9Z QVT0\n" + 
     "\"Zwickau,Germany, United States\"  +50.70000  +12.50000  J17H RFH0"; 
String[] tok = s.split("\\s+(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)"); 
System.out.println(Arrays.toString(tok));

输出：

["Zanesville,OH", +39.93830, -82.00830, 84ZC, PMNQ 
"Zaragoza,Spain", +41.66670, -1.05000, GWC7, PXB0 
"Zurich,Switzerland", +47.36670, +8.53330, HP9Z, QVT0 
"Zwickau,Germany, United States", +50.70000, +12.50000, J17H, RFH0]

来源

2014-09-22 13:42:17

但是，如果有什么，但第一个值被打断”“ – Falco 2014-09-22 15:16:10

我不知道你的意思，你能提供一个前？ – 2014-09-22 15:20:48

这不会匹配'\“Zanesville，OH \”+39.93830 \“ - 82.00830 \”84ZC PMNQ' – Falco 2014-09-22 15:27:08

你可以使用line.split("\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")正则表达式来制作你想要的输出。

来源

2014-09-22 13:29:16 ashokramcse

这将适用于示例输入，但如果城市名称中有空格将会导致麻烦。例如，“纽约纽约”将打破它。 – BlairHippo 2014-09-22 13:30:23

@BlairHippo修复你可以使用'（？<= [\“AZ \\ d]）\\ s +' – hwnd 2014-09-22 13:46:38

这是一个错字=） – hwnd 2014-09-22 13:50:27

为Excel的更通用的解决方案如CSV

这看起来很像制表符分隔文本，制表符被多个空格替换。双引号暗示CSV类似于Excel。

由于双引号之间的文本可能包含换行符（多行文本），因此我从整个文本开始。

String encoding = "Windows-1252"; // English, best would be "UTF-8". 
byte[] textAsBytes = Files.readAllBytes(file.toPath()); 
String text = new String(textAsBytes, encoding);

Excel用于（Windows）行结尾"\r\n"。并在多行文字"\n"。

String[] lines = text.split("\r\n");

拆分多个空格.split(" +")可能会在引用字段内部分裂。所以我使用一种模式。此模式使用引用的内容，其中任何内部引号都是自引用为两个引号。或者是一个非空白的序列。

Pattern pattern = Pattern.compile("\"^([^\"]|\"\")*\"|\\S+"); 
for (String line: lines) { 
    List<String> fields = new ArrayList<>(); 
    Matcher m = pattern.matcher(line); 
    while (m.find()) { 
     String field = m.group(); 
     if (fields.startsWith("\"") && field.endsWith("\"") && field.length() >= 2) { 
      field = field.substring(1, field.length() - 1); // Strip quotes. 
      field = field.replace("\"\"", "\""); // Unescape inner quotes. 
     } 
     fields.add(field)); 
    } 
    ... 
}

来源

2014-09-22 13:57:28

劈裂正则表达式

回答

相关问题