从字符串中提取多个子字符串

我有一个xlsx文件，其中包含条形码列表，这些条形码列出了三个或四个单元格，我需要将它们分开，因此只有条形码。从字符串中提取多个子字符串

条形码本身始终是6个数字的字符串，但它们可能以几个不同的字母开头，可能有也可能没有逗号，＆符号以及单元格中的其他单词。它看起来是这样的：

 
COL 1 | COL 2 | COL 3 | COL 4 | COL 5 
Info | Identifier | Info | Info | L123456 , PC 654321 , M 123654 & 546123 Vacant | 
Info | Identifier | Info | Info | PC 123456 , M 456789 Occupied 
Info | Identifier | Info | Info | L 987654

到目前为止，我已经尝试使用正则表达式来清除所有的噪声数据，只是会留下的条形码，但是这已经返回一个混乱的烂摊子。

我还需要有一种方法来跟踪他们来自哪一行，因为在以前的列中有一个标识符需要链接到这些条形码。我能够很容易地访问这个标识符。

我正在使用excel ComObject来操作此工作表。这是我用来尝试正则表达式的代码，我如何提取条形码？

$xl = new-object -ComObject excel.application 
$xl.visible = $true 
$xl.displayalerts = $false 
$xl.workbooks.open("file.xls") 
$sheet = $xl.activeworkbook.activesheet 
$x = 3 
3..8|%{ 
    $uc = $sheet.Range("B"+$x).Text 
    $equip = $sheet.Range("I"+$x).Text 
    $loc = $sheet.Range("D"+$x).Text + '-NHQ' 
    $uidcc = $uc.replace("/",",") 
    $tagnums = $equip -replace " [A-Z]+ ","" 
    $tagnums = $tagnums -replace " & ","" 
    $tagnums = $tagnums -replace "[A-C][1-9]+","" 
    $tagnums = $tagnums -split ',' 
    foreach($i in $tagnums){ 
     $asset += $i+","+$loc+","+$uidcc+"`n" 
    } 
    $x++ 
} 
$asset | Format-Table 
$xl.quit() 
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($xl)

来源

2017-04-14 Cameron

如果我理解你的权利，那么你需要的是这样的：

$tagnums = @([regex]::matches($equip,'\D*(\d{6})')|%{$_.groups[1].value})

例如，输入数据'L123456 , PC 654321 , M 123654 & 546123 Vacant'将成为下一个输出：

和'L 987654'将是987654。

来源

2017-04-14 19:19:22

这是完美的工作，谢谢！ – Cameron

可以简化正则表达式直到''\ d {6}'' –

从字符串中提取多个子字符串

回答

相关问题