2017-11-18 175 views
1

我试图以CSV或JSON格式检索国家/地区的ISO代码。我的代码是如下:从网站获取国家/地区ISO代码

# ############################ 
$logFile = "$env:USERPROFILE\desktop\ISOCountry.log" 
Start-Transcript -Path $logFile -Append 
######################################### 

$WebResponse = Invoke-WebRequest "http://kirste.userpage.fu-berlin.de/diverse/doc/ISO_3166.html" 
#$WebResponse = Invoke-WebRequest "https://en.wikipedia.org/wiki/ISO_3166-1" 
$PRETAG = $WebResponse.ParsedHtml.getElementsByTagName("PRE") | select -expand innertext 
$PRETAG 
$JsonText = $PRETAG | ConvertTo-csv 
$JsonText 
# end logging 
########################### 
Stop-Transcript 
########################### 

的数据为PRE标签内,是所有可能的制表符分隔格式。需要帮助。由于这是一个免费网站,因此我使用该网站。

我试图从维基的数据并不能使用下面的代码来获取相同的:

$URI = “https://en.wikipedia.org/wiki/ISO_3166-1“ 
$HTML = Invoke-WebRequest -Uri $URI 
($HTML.ParsedHtml.getElementsByTagName('table') | Where{ $_.className -eq 'wikitable sortable' }).innerText 

仍然面临着同样的问题。需要帮助。

回答

1

这是我会做的事HTMLAgilityPack。 您可以从http://html-agility-pack.net/ 下载该包这是一个众所周知的和受人尊敬的框架,用于结合XPath来抓取网站。

cls 
[void][Reflection.Assembly]::LoadFile("C:\temp\HtmlAgilityPack\lib\Net20\HtmlAgilityPack.dll”) 
[HtmlAgilityPack.HtmlWeb]$web = @{} 
[HtmlAgilityPack.HtmlDocument]$doc = $web.Load("https://en.wikipedia.org/wiki/ISO_3166-1") 

## FILTER NEEDED CONTENT THROUGH X-PATH 
[HtmlAgilityPack.HtmlNodeCollection]$country = $doc.DocumentNode.SelectNodes("//table[2]//tr//td[1]") 
[HtmlAgilityPack.HtmlNodeCollection]$iso = $doc.DocumentNode.SelectNodes("//table[2]//tr//td[5]") 

# go trough the arrays and put each item into output 
$output = @() 
for($i=0; $i -le $country.selectnodes.Count; $i++){ 

    $output += [pscustomobject] @{ 
    country = $country[$i].InnerText 
    iso = $iso[$i].innertext 
    }  
} 
# export csv 
$output | ConvertTo-Csv -Delimiter ";" -NoTypeInformation | out-file C:\temp\iso.csv -Force 

这将给你喜欢的输出:

"country";"iso" 
"Afghanistan";"ISO 3166-2:AF" 
"Aland Islands !Åland Islands";"ISO 3166-2:AX" 
"Albania";"ISO 3166-2:AL" 
"Algeria";"ISO 3166-2:DZ" 
"American Samoa";"ISO 3166-2:AS" 

编辑:找到一个更好的性能的方式

0

感谢HTMLAgility模块。我没有ADMIN权限来安装模块等我这是怎么做的:

<################################################CODE HEADER############################################ 
SCRIPT NAME  : ISO Country Code.ps1 
DESCRIPTION  : 
RUNTIME PARAMETERS: 
INPUT PARAMETERS : 
OUTPUT PARAMETERS : 
Date              Developer         Description 
-------------------------------------- ---------------------------------------------- --------------------------------- 

################################################CODE HEADER############################################> 

<################################################SAMPLE DATA############################################ 
name   : Zimbabwe 
topLevelDomain : {.zw} 
alpha2Code  : ZW 
alpha3Code  : ZWE 
callingCodes : {263} 
capital  : Harare 
altSpellings : {ZW, Republic of Zimbabwe} 
region   : Africa 
subregion  : Eastern Africa 
population  : 14240168 
latlng   : {-20.0, 30.0} 
demonym  : Zimbabwean 
area   : 390757.0 
gini   : 
timezones  : {UTC+02:00} 
borders  : {BWA, MOZ, ZAF, ZMB} 
nativeName  : Zimbabwe 
numericCode : 716 
currencies  : {@{code=BWP; name=Botswana pula; symbol=P}, @{code=GBP; name=British pound; symbol=£}, @{code=CNY; name=Chinese yuan; symbol=¥}, @{code=EUR; name=Euro; 
       symbol=€}...} 
languages  : {@{iso639_1=en; iso639_2=eng; name=English; nativeName=English}, @{iso639_1=sn; iso639_2=sna; name=Shona; nativeName=chiShona}, @{iso639_1=nd; iso639_2=nde; 
       name=Northern Ndebele; nativeName=isiNdebele}} 
translations : @{de=Simbabwe; es=Zimbabue; fr=Zimbabwe; ja=ジンバブエ; it=Zimbabwe; br=Zimbabwe; pt=Zimbabué; nl=Zimbabwe; hr=Zimbabve; fa=زیمباوه} 
flag   : https://restcountries.eu/data/zwe.svg 
regionalBlocs : {@{acronym=AU; name=African Union; otherAcronyms=System.Object[]; otherNames=System.Object[]}} 
cioc   : ZIM 
###############################################################################################################################> 

# ############################ 
$logFile = 'D:\03_PowerShell\PowerShell Scripts\ISO Country Code\ISOCountryCodes.log' 
Start-Transcript -Path $logFile -Append 

$isodate = Get-Date -Format ddMMMyyyy_HHhmmss 

#BUILDING FILE PATH 
$FilePath = "D:\03_PowerShell\PowerShell Scripts\ISO Country Code\" 

#BUILDING JSON FILE PATH 
$JsonFileName = "ISOCountryCode_$isodate" 
$JsonFileExtn = ".json" 
$JsonFileOutputPath = $FilePath+$JsonFileName+$JsonFileExtn 

#BUILDING CSV FILE PATH 
$CsvFileName = "ISOCountryCode_$isodate" 
$CsvFileExtn = ".csv" 

#CSV File Path 
$CsvFileOutputPath = $FilePath+$CsvFileName+$CsvFileExtn 

"The Log File is placed at: - $logFile" 
"The ISO Date is: - $isodate " 
"The Full File Path for the JSON File is: - $JsonFileOutputPath" 
"The Full File Path for the CSV File is: - $CsvFileOutputPath" 

#CLEAR RESULT SCREEN 
cls 

#INVOKE REST METHOD TO RETREIEVE DATA 
$ISOCountryCode = Invoke-RestMethod "https://restcountries.eu/rest/v2/all" 
$ISOCountryCodeFormatted = $ISOCountryCode | Select-Object @{Name="Country Name";Expression={$_."name"}} ` 
                  ,@{Name = "Internet Domain"; Expression={$_."topLevelDomain"}} ` 
                  ,@{Name = "Alpha 2 Code"; Expression={$_."alpha2Code"}} ` 
                  ,@{Name = "Alpha 3 Code"; Expression={$_."alpha3Code"}} ` 
                  ,@{Name = "Capital"; Expression={$_."capital"}} ` 
                  ,@{Name = "Continent"; Expression={$_."region"}} ` 
                  ,@{Name = "Area (LandMass)"; Expression={$_."area"}} ` 
                  ,@{Name = "Numeric Code"; Expression={$_."numericCode"}} ` 

<#BELOW COLUMNS AND VALUES TO BE USED AS REQUIRED 
#,@{Name = "population"; Expression={$_."population"}} ` 
#,@{Name = "Languages Used"; Expression={$_."languages "}} ` 
#,@{Name = "International Dialling Code"; Expression={$_."callingCodes "}} ` 
#,@{Name = "Latitude/Longitude"; Expression={$_."latlng"}} ` 
#,@{Name = "Translations"; Expression={$_."translations"}} 
#,@{Name = "Country Direction/Location"; Expression={$_."subregion "}} ` 
#,@{Name = "Timezones"; Expression={$_."timezones "}} ` 
#,@{Name = "Borders"; Expression={$_."borders "}}#> ` 

#TEST THE CODE TO CONVERT RESULTSET TO JSON 
#$JsonText = $Foo1 | ConvertTo-Json 
#$JsonText 

$ISOCountryCodeFormatted | ft 

#BUILD A JSON FILE 
$text | Set-Content $JsonFileOutputPath 

#BUILD JSON FILE WITH HEADER > DATA > FOOTER 
$JsonHeader = '{ 
    "ISOCountryCodes":' | Add-Content $JsonFileOutputPath 
$ISOCountryCodeFormatted | ConvertTo-Json | Add-Content $JsonFileOutputPath 
$JsonFooter = '}' | Add-Content $JsonFileOutputPath 

#GENERATE CSV FILE 
$ISOCountryCodeFormatted |Export-Csv $CsvFileOutputPath -encoding "unicode" -NoTypeInformation 

#END TRANSCRIPT FOR LOGGING DATA FLOW 
Stop-Transcript 
+0

我很抱歉地说,你可以只下载了HTMLAgilityPack,将它解压缩,并把它放在桌面上。没有安装需要。这是这个包的美丽。但是,你做了你自己的剧本。 – Snak3d0c

相关问题