2010-06-17 145 views
16

我正在将其全局文件lib/中的71 .jar文件转换为使用Maven。当然,这些项目在过去的十年中已经被很多开发人员从网络中拉出来,并且并不总是将所有必要的版本信息添加到VCS中。在Maven存储库中找到正确版本的正确JAR

有没有简单,自动化的方式,从那套.jar文件到相应的<dependency/>元素,用于我的pom.xml文件中?我希望有一个网页,我可以提交一个jar文件的校验和并取回一个XML片段。谷歌命中'Maven存储库搜索'基本上只是找到基于名称的搜索。据我所知,http://repo1.maven.org/没有任何搜索。

更新:GrepCode看起来像它可以找到给定MD5校验和的项目。但它没有提供Maven需要的特定细节(groupIdartifactId)。

这是我想出了根据公认的答案脚本:

#!/bin/bash 

for f in *.jar; do 
    s=`md5sum $f | cut -d ' ' -f 1`; 
    p=`wget -q -O - "http://www.jarvana.com/jarvana/search?search_type=content&content=${s}&filterContent=digest" | grep inspect-pom | cut -d \" -f 4`; 
    pj="http://www.jarvana.com${p}"; 
    rm -f tmp; 
    wget -q -O tmp "$pj"; 

    g=`grep groupId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`; 
    a=`grep artifactId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`; 
    v=`grep version tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`; 
    rm -f tmp; 

    echo '<dependency> <!--' $f $s $pj '-->'; 
    echo " <groupId>$g</groupId>"; 
    echo " <artifactId>$a</artifactId>"; 
    echo " <version>$v</version>"; 
    echo "</dependency>"; 
    echo; 
done 
+0

哇噢找到了!看起来像一个新的商业机会。 – 2010-06-17 15:51:07

回答

0

嗨,你可以使用mvnrepository搜索文物或者您可以使用Eclipse和经过加依赖有一个搜索是使用maven central的索引。

2

我和OP的情况相同,但是在后面的回答中提到Jarvana已经不在了。

我使用了Maven Central Search和它们的search api的校验和功能来实现相同的结果。

首先创建一个sha1sums

sha1sum *.jar > jar-sha1sums.txt 

然后用下面的python脚本来检查文件中是否存在问题

import json 
import urllib2 

f = open('./jar-sha1sums.txt','r') 
pom = open('./pom.xml','w') 
for line in f.readlines(): 
    sha = line.split(" ")[0] 
    jar = line.split(" ")[1] 
    print("Looking up "+jar) 
    searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22'+sha+'%22&rows=20&wt=json' 
    page = urllib2.urlopen(searchurl) 
    data = json.loads("".join(page.readlines())) 
    if data["response"] and data["response"]["numFound"] == 1: 
     print("Found info for "+jar) 
     jarinfo = data["response"]["docs"][0] 
     pom.write('<dependency>\n') 
     pom.write('\t<groupId>'+jarinfo["g"]+'</groupId>\n') 
     pom.write('\t<artifactId>'+jarinfo["a"]+'</artifactId>\n') 
     pom.write('\t<version>'+jarinfo["v"]+'</version>\n') 
     pom.write('</dependency>\n') 
    else: 
     print "No info found for "+jar 
     pom.write('<!-- TODO Find information on this jar file--->\n') 
     pom.write('<dependency>\n') 
     pom.write('\t<groupId></groupId>\n') 
     pom.write('\t<artifactId>'+jar.replace(".jar\n","")+'</artifactId>\n') 
     pom.write('\t<version></version>\n') 
     pom.write('</dependency>\n') 
pom.close() 
f.close() 

上的罐子的任何信息是因人而异

2

借用代码和想法从@Karl Tryggvason但不能得到python脚本工作。作为一个Windows猴子,我在Powershell(需要v3)中做了类似的事情,但没有那么复杂(不会生成一个pom,只是转储结果),但我认为这可能会在几分钟内拯救某个人。

$log = 'c:\temp\jarfind.log' 

Get-Date | Tee-Object -FilePath $log 

$jars = gci d:\source\myProject\lib -Filter *.jar 

foreach ($jar in $jars) 
{ 
    $sha = Get-FileHash -Algorithm SHA1 -Path $jar.FullName | select -ExpandProperty hash 
    $name = $jar.Name 
    $json = Invoke-RestMethod "http://search.maven.org/solrsearch/select?q=1:%22$($sha)%22&rows=20&wt=json" 
    "Found $($json.response.numfound) jars with sha1 matching that of $($name)..." | Tee-Object -FilePath $log -Append 
    $jarinfo = $json.response.docs 
    $jarinfo | Tee-Object -FilePath $log -Append 
} 
0

如果要使用artifactId和从jar名称中读取的版本,可以使用以下代码。这是一个即兴版本Karl's

import os 
import sys 
from subprocess import check_output 

import requests 

def searchByShaChecksum(sha): 
    searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22' + sha + '%22&rows=20&wt=json' 
    resp = requests.get(searchurl) 
    data = resp.json() 
    return data 


def searchAsArtifact(artifact, version): 
    searchurl = 'http://search.maven.org/solrsearch/select?q=a:"' + artifact + '" AND v:"' + version.strip() + '"&rows=20&wt=json' 
    resp = requests.get(searchurl) 
    # print(searchurl) 
    data = resp.json() 
    return data 


def processAsArtifact(file: str): 
    data = {'response': {'start': 0, 'docs': [], 'numFound': 0}} 
    jar = file.replace(".jar", "") 
    splits = jar.split("-") 
    if (len(splits) < 2): 
    return data 
    for i in range(1, len(splits)): 
    artifact = "-".join(splits[0:i]) 
    version = "-".join(splits[i:]) 
    data = searchAsArtifact(artifact, version) 
    if data["response"] and data["response"]["numFound"] == 1: 
     return data 
    return data 


def writeToPom(pom: object, grp: str = None, art: str = None, ver: str = None): 
    if grp is not None and ver is not None: 
    pom.write('<dependency>\n') 
    else: 
    pom.write('<!-- TODO Find information on this jar file--->\n') 
    pom.write('<dependency>\n') 
    grp = grp if grp is not None else "" 
    art = art if art is not None else "" 
    ver = ver if ver is not None else "" 
    pom.write('\t<groupId>' + grp + '</groupId>\n') 
    pom.write('\t<artifactId>' + art + '</artifactId>\n') 
    pom.write('\t<version>' + ver + '</version>\n') 
    pom.write('</dependency>\n') 


def main(argv): 
    if len(argv) == 0: 
    print(bcolors.FAIL + 'Syntax : findPomJars.py <lib_dir_path>' + bcolors.ENDC) 
    lib_home = str(argv[0]) 
    if os.path.exists(lib_home): 
    os.chdir(lib_home) 

    pom = open('./auto_gen_pom_list.xml', 'w') 
    successList = [] 
    failedList = [] 
    jarCount = 0 
    for lib in sorted(os.listdir(lib_home)): 
     if lib.endswith(".jar"): 
     jarCount += 1 
     sys.stdout.write("\rProcessed Jar Count: %d" % jarCount) 
     sys.stdout.flush() 
     checkSum = check_output(["sha1sum", lib]).decode() 
     sha = checkSum.split(" ")[0] 
     jar = checkSum.split(" ")[1].strip() 
     data = searchByShaChecksum(sha) 
     if data["response"] and data["response"]["numFound"] == 0: 
      data = processAsArtifact(jar) 

     if data["response"] and data["response"]["numFound"] == 1: 
      successList.append("Found info for " + jar) 
      jarinfo = data["response"]["docs"][0] 
      writeToPom(pom, jarinfo["g"], jarinfo["a"], jarinfo["v"]) 
     else: 
      failedList.append("No info found for " + jar) 
      writeToPom(pom, art=jar.replace(".jar\n", "")) 
    pom.close() 

    print("\n") 
    print("Success : %d" % len(successList)) 
    print("Failed : %d" % len(failedList)) 

    for entry in successList: 
     print(entry) 
    for entry in failedList: 
     print(entry) 

    else: 
    print 
    bcolors.FAIL + lib_home, " directory doesn't exists" + bcolors.ENDC 


if __name__ == "__main__": 
    main(sys.argv[1:]) 

代码也可以在GitHub