正则表达式 - 查找并替换并转换为CSV格式

我有一个文件，其中包含电话号码，并希望创建一个CSV文件。正则表达式 - 查找并替换并转换为CSV格式

我面临的问题是格式不固定，不容易解析。

每行包含一个，两个或三个电话记录。
手机可能启动或不启动（+ xxx），第二部手机可能在'&'之前或之后。

我试图建立一个正则表达式，可以在3组分为每行，然后查找/替换注入预期的格式，但没有成功。

任何人都可以想出一个正则表达式可以识别每行每组？

输入

(+999) 11 762 52 61 (+999) 11 762 41 11 
(+999) 44 695 01 76 & 44 695 01 89 
(+999) 21 510 02 14 (+999) 21 511 97 98 
(+999) 01 05 00 18 67 
(+999) 21 552 42 12 
(+999) 21 557 86 60 (+999) 21 557 86 72 
(+999) 11 873 93 13 & 11 825 59 92 
(+999) 15 307 57 15 & 15 307 57 16 & (+999) 11 974 19 57 
(+999) 21 551 91 51 (+999) 21 551 91 68 
(+999) 21 551 71 71 & 21 551 72 32 
(+999) 21 527 30 00 (+999) 21 551 54 89 
(+999) 11 621 15 00 (+999) 11 626 20 75 
(+999) 21 555 21 60 (+999) 21 555 21 71 (+999) 12 804 76 30 
(+999) 11 234 18 96 (+999) 11 234 54 48 
(+999) 11 828 35 37 (+999) 11 828 63 76 (+999) 41 363 27 23 
(+999) 11 690 03 00 (+999) 11 315 65 38 
(+999) 08 32 60 34 65 
(+999) 08 32 60 34 65 & (+999) 11 784 46 70 & (+999) 11 784 61 79

预期结果：

(+999) 11 762 52 61, (+999) 11 762 41 11, 
(+999) 44 695 01 76, 44 695 01 89, 
(+999) 21 510 02 14, (+999) 21 511 97 98, 
(+999) 01 05 00 18 67,, 
(+999) 21 552 42 12,, 
(+999) 21 557 86 60, (+999) 21 557 86 72, 
(+999) 11 873 93 13, 11 825 59 92, 
(+999) 15 307 57 15, 15 307 57 16, (+999) 11 974 19 57 
(+999) 21 551 91 51, (+999) 21 551 91 68, 
(+999) 21 551 71 71, 21 551 72 32, 
(+999) 21 527 30 00, (+999) 21 551 54 89, 
(+999) 11 621 15 00, (+999) 11 626 20 75, 
(+999) 21 555 21 60, (+999) 21 555 21 71, (+999) 12 804 76 30 
(+999) 11 234 18 96, (+999) 11 234 54 48, 
(+999) 11 828 35 37, (+999) 11 828 63 76, (+999) 41 363 27 23 
(+999) 11 690 03 00, (+999) 11 315 65 38, 
(+999) 08 32 60 34 65,, 
(+999) 08 32 60 34 65, (+999) 11 784 46 70, (+999) 11 784 61 79

来源

2017-04-09 Alg_D

生成的Python代码 - >为什么不计数字的一排是多少？ – Dieter

分流/与爆炸'（\（|？！。[＆（] +）'也许 – chris85

import math 

for l in file: 

    nr_of_prefixes = l.count('(+') # amount of prefixes (+xxx) 
    prefixes = nr_of_prefixes * 3 # count the characters of a prefix 
    numbers = sum(c.isdigit() for c in l) # amount of numbers in a string 
    numbers -= prefixes # remove the prefixes 
    telephone_numbers = math.floor(numbers/8) # number of digits 


    l = l.replace(' (+',', (+') # add a , to (+ 
    l = l.replace(' &',',')  # replace a & by a comma 
    l = l.replace(',,',',')  # replace double ,, by a single , 

    # if there where only 2 phone numbers, add an ending comma 
    if telephone_numbers < 3: 
     l += "," 

    # if there was only 1 phone numbers, add an extra comma 
    if telephone_numbers < 2: 
     l += "," 

    # print, or add to a list 
    print(l)

来源

2017-04-09 21:31:21 Dieter

是的，那岩石感谢 –

使用以下正则表达式：((\(\+999\)[\d ]+)|& ([\d ]+))

这里是你的文件内容的范例：

https://regex101.com/r/Q8grqd/1

如果你正在使用python通过regex101代码生成

import re 

regex = r"((\(\+999\)[\d ]+)|& ([\d ]+))" 

test_str = ("(+999) 11 762 52 61 (+999) 11 762 41 11\n" 
    "(+999) 44 695 01 76 & 44 695 01 89\n" 
    "(+999) 21 510 02 14 (+999) 21 511 97 98\n" 
    "(+999) 01 05 00 18 67\n" 
    "(+999) 21 552 42 12\n" 
    "(+999) 21 557 86 60 (+999) 21 557 86 72\n" 
    "(+999) 11 873 93 13 & 11 825 59 92\n" 
    "(+999) 15 307 57 15 & 15 307 57 16 & (+999) 11 974 19 57\n" 
    "(+999) 21 551 91 51 (+999) 21 551 91 68\n" 
    "(+999) 21 551 71 71 & 21 551 72 32\n" 
    "(+999) 21 527 30 00 (+999) 21 551 54 89\n" 
    "(+999) 11 621 15 00 (+999) 11 626 20 75\n" 
    "(+999) 21 555 21 60 (+999) 21 555 21 71 (+999) 12 804 76 30\n" 
    "(+999) 11 234 18 96 (+999) 11 234 54 48\n" 
    "(+999) 11 828 35 37 (+999) 11 828 63 76 (+999) 41 363 27 23\n" 
    "(+999) 11 690 03 00 (+999) 11 315 65 38\n" 
    "(+999) 08 32 60 34 65\n" 
    "(+999) 08 32 60 34 65 & (+999) 11 784 46 70 & (+999) 11 784 61 79") 

matches = re.finditer(regex, test_str, re.MULTILINE) 

for matchNum, match in enumerate(matches): 
    matchNum = matchNum + 1 

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group())) 

    for groupNum in range(0, len(match.groups())): 
     groupNum = groupNum + 1 

     print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

来源

2017-04-09 21:37:58

这已经是什么，但没有真正纠正重复比赛，远没有期望的输出，但感谢反正:) –

我提供的正则表达式来在每一行中提取电话号码，是不是你想要的，对不起，我知道正则表达式，而不是蟒蛇，从正则表达式101生成的代码，我以为你知道如何使用它？; ） –

正则表达式似乎没有提供正是我一直在寻找的，例如，如果你使用像记事本编辑++，pycharm等，并使用它的查找/替换（如正则表达式），并在3个grpups（替换\ 1 \ 2，\ 3）dos不适用于所有情况 –

正则表达式 - 查找并替换并转换为CSV格式

回答

相关问题