2016-01-21 121 views
0

我有一个乱七八糟的CSV文件。我试图用正则表达式从csv文件中列中的值中提取名字和姓氏。姓和名会有自己的专栏。使用多个分隔符和复制分隔符在CSV文件中分隔列值

CSV文件(带分隔符的不同组合):

ID,Description,Number 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops ,SomeValue 
JDo,John Doe - Temp - Client Client Ops ,SomeValue 
JDo,John Doe-Temp-Client Client Ops,SomeValue 
JDo,John Doe - Temp-Client Client Ops,SomeValue 
JDo,John Doe - Temp-Client Client Ops,SomeValue 
JDo,John Doe-Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe - Temp - Client Client Ops,SomeValue 
JDo,John Doe-Temp - Client Client Ops ,SomeValue 
JDo,John Doe-Temp-Client Client Ops ,SomeValue 
JDo,John.Doe - Temp - Client Client Ops,SomeValue 
JDo,John .Doe - Temp - Client Client Ops,SomeValue 
JDo,John. Doe - Temp - Client Client Ops,SomeValue 
JDo,John . Doe - Temp - Client Client Ops,SomeValue 
JDo,John.Doe - Temp - Client Client Ops ,SomeValue 
JDo,John .Doe - Temp - Client Client Ops ,SomeValue 
JDo,John. Doe - Temp - Client Client Ops ,SomeValue 
JDo,John . Doe - Temp - Client Client Ops ,SomeValue 
JDo,John.Doe-Temp-Client Client Ops,SomeValue 
JDo,John .Doe-Temp-Client Client Ops,SomeValue 
JDo,John. Doe-Temp-Client Client Ops,SomeValue 
JDo,John . Doe-Temp-Client Client Ops,SomeValue 
JDo,John.Doe - Temp - Client Client Ops,SomeValue 
JDo,John .Doe - Temp - Client Client Ops,SomeValue 
JDo,John. Doe - Temp - Client Client Ops,SomeValue 
JDo,John . Doe - Temp - Client Client Ops,SomeValue 
JDo,John?Doe - Temp - Client Client Ops,SomeValue 
JDo,John ?Doe - Temp - Client Client Ops,SomeValue 
JDo,John? Doe - Temp - Client Client Ops,SomeValue 
JDo,John ? Doe - Temp - Client Client Ops,SomeValue 
JDo,John?Doe - Temp - Client Client Ops ,SomeValue 
JDo,John ?Doe - Temp - Client Client Ops ,SomeValue 
JDo,John? Doe - Temp - Client Client Ops ,SomeValue 
JDo,John ? Doe - Temp - Client Client Ops ,SomeValue 
JDo,John?Doe-Temp-Client Client Ops,SomeValue 
JDo,John ?Doe-Temp-Client Client Ops,SomeValue 
JDo,John? Doe-Temp-Client Client Ops,SomeValue 
JDo,John ? Doe-Temp-Client Client Ops,SomeValue 
JDo,John?Doe - Temp - Client Client Ops,SomeValue 
JDo,John ?Doe - Temp - Client Client Ops,SomeValue 
JDo,John? Doe - Temp - Client Client Ops,SomeValue 
JDo,John ? Doe - Temp - Client Client Ops,SomeValue 
JDo,"John,Doe - Temp - Client Client Ops",SomeValue 
JDo,"John ,Doe - Temp - Client Client Ops",SomeValue 
JDo,"John, Doe - Temp - Client Client Ops",SomeValue 
JDo,"John , Doe - Temp - Client Client Ops",SomeValue 
JDo," John,Doe - Temp - Client Client Ops ",SomeValue 
JDo," John ,Doe - Temp - Client Client Ops ",SomeValue 
JDo," John, Doe - Temp - Client Client Ops ",SomeValue 
JDo," John , Doe - Temp - Client Client Ops ",SomeValue 
JDo,"John,Doe-Temp-Client Client Ops",SomeValue 
JDo,"John ,Doe-Temp-Client Client Ops",SomeValue 
JDo,"John, Doe-Temp-Client Client Ops",SomeValue 
JDo,"John , Doe-Temp-Client Client Ops",SomeValue 
JDo,"John,Doe - Temp - Client Client Ops",SomeValue 
JDo,"John ,Doe - Temp - Client Client Ops",SomeValue 
JDo,"John, Doe - Temp - Client Client Ops",SomeValue 
JDo,"John , Doe - Temp - Client Client Ops",SomeValue 
JDo,John-Doe - Temp - Client Client Ops,SomeValue 
JDo,John -Doe - Temp - Client Client Ops,SomeValue 
JDo,John- Doe - Temp - Client Client Ops,SomeValue 
JDo,John - Doe - Temp - Client Client Ops,SomeValue 
JDo,John-Doe - Temp - Client Client Ops ,SomeValue 
JDo,John -Doe - Temp - Client Client Ops ,SomeValue 
JDo,John- Doe - Temp - Client Client Ops ,SomeValue 
JDo,John - Doe - Temp - Client Client Ops ,SomeValue 
JDo,John-Doe-Temp-Client Client Ops,SomeValue 
JDo,John -Doe-Temp-Client Client Ops,SomeValue 
JDo,John- Doe-Temp-Client Client Ops,SomeValue 
JDo,John - Doe-Temp-Client Client Ops,SomeValue 
JDo,John-Doe - Temp - Client Client Ops,SomeValue 
JDo,John -Doe - Temp - Client Client Ops,SomeValue 
JDo,John- Doe - Temp - Client Client Ops,SomeValue 
JDo,John - Doe - Temp - Client Client Ops,SomeValue

要添加的第一个和最后一个名字列,我使用下面的代码:

Function FixRxClaimReportAddFirstLastNameColumn { 
    Param ($csvFile) 

    Write-Host "Adding columns 'First Name' and 'Last Name' to $csvFile" 
    Import-Csv $csvFile | 
    Select-Object *, @{n='First Name'; e={if ($_.Description) { 
     $columnFirstNameValue = $($_.Description -replace '\s+', ' ').split(" ")[0] 
     if ($columnFirstNameValue -notlike "*,*" -and $columnFirstNameValue -notmatch '\?' -and $columnFirstNameValue -notlike "*.*" -and $columnFirstNameValue -notlike "*-*") { 
      $columnFirstNameValue.Trim() 
     } else { 
      $columnFirstNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "} 
      $columnFirstNameValue2[0].Trim() 
     } 
     }}}, @{n='Last Name'; e={if ($_.Description) { 
     $columnLastNameValue = $($_.Description -replace '\s+', ' ').split(" ")[1] 
     if ($columnLastNameValue -notlike "*,*" -and $columnLastNameValue -notmatch '\?' -and $columnLastNameValue -notlike "*.*" -and $columnLastNameValue -notlike "*-*") { 
      $columnLastNameValue.Trim() 
     } else { 
      $columnLastNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "} 
      $columnLastNameValue2[1].Trim() 
     } 
     }}} | Export-Csv "$csvFile-Results.csv" -NoTypeInformation -Force 
    Write-Host "Complete." 
    Write-Host "" 
} 

FixRxClaimReportAddFirstLastNameColumn 'C:\Scripts\Tests\Test1.csv' 

当我运行这段代码,所有的名字值应该是John,并且所有的姓氏值应该是Doe。然而,所有的价值都非常不同。

回答

3

你在想太复杂。从Description字段末尾删除附加信息以获取名称,然后修剪名称并将其分割为名和姓,然后将这些名称作为新属性添加到输入对象。

试试这个:

Import-Csv 'C:\path\to\input.csv' | ForEach-Object { 
    $rawname = $_.Description -replace '-[^-]*-[^-]*$' 
    $firstname, $lastname = $rawname.Trim() -split ' *[ \?\.,-] *' 
    $_ | Add-Member -Type NoteProperty -Name FirstName -Value $firstname 
    $_ | Add-Member -Type NoteProperty -Name LastName -Value $lastname 
    $_ 
} | Export-Csv 'C:\path\to\output.csv' -NoType 
+0

感谢安斯加尔。你永远是一个很大的帮助:) –

+0

你完全可以离开'$ rawname':'$ FirstName,$ LastName,$ null = $ _ -split'[\ s \?。, - ]'| ? {$ _}' – xXhRQ8sD2L7Z

+0

@ ST8Z6FR57ABE6A8RE9UF如果您不在同一行中阅读和理解,则会更容易阅读和理解。此外,以这种方式拆分的缺点是不能处理包含连字符的多个名字或名称。 –