2012-04-17 89 views
24

我们有一个有很多目录和文件的svn仓库,我们的构建系统需要能够递归地查找所有svn:externals属性在存储库中,在检出之前。目前我们使用:快速获得一个远程svn仓库的所有svn:externals的列表

svn propget svn:externals -R http://url.of.repo/Branch 

这已被证明非常耗时,而且是一个真正的带宽猪。看起来,客户端正在接收所有回购协议中的所有道具并在本地进行过滤(虽然我还没有用wireshark证实过)。有没有更快的方法来做到这一点?最好是让服务器只返回所需数据的某种方式。

回答

0

由于-R开关很慢;递归搜索存储库路径中的所有目录,这是很多工作。

+0

是的,我想出了它为什么这样做,但只是想知道是否有人有更快的方式。尽管如此,它并不看好。 – NeoSkye 2012-04-21 01:26:41

+0

我认为这很简单:如果你需要递归地做,它会很慢。 – Argeman 2012-04-23 07:57:38

2

我终于想出了一个解决方案。我决定把请求分解成多个小的svn请求,然后让每个任务由一个线程池运行。这种抨击svn服务器,但在我们的情况下,svn服务器是在局域网上,这个查询只是在完整的构建过程中,所以它似乎不是一个问题。

import os 
import sys 
import threading 
import ThreadPool 

thread_pool = ThreadPool.ThreadPool(8) 
externs_dict = {} 
externs_lock = threading.Lock() 

def getExternRev(path, url): 
    cmd = 'svn info "%s"' % url 
    pipe = os.popen(cmd, 'r') 
    data = pipe.read().splitlines() 

    #Harvest last changed rev 
    for line in data: 
     if "Last Changed Rev" in line: 
      revision = line.split(":")[1].strip() 
      externs_lock.acquire() 
      externs_dict[path] = (url, revision) 
      externs_lock.release() 

def getExterns(url, base_dir): 
    cmd = 'svn propget svn:externals "%s"' % url 
    pipe = os.popen(cmd, 'r') 
    data = pipe.read().splitlines() 
    pipe.close() 

    for line in data: 
     if line: 
      line = line.split() 
      path = base_dir + line[0] 
      url = line[1] 
      thread_pool.add_task(getExternRev, path, url) 

def processDir(url, base_dir): 
    thread_pool.add_task(getExterns, url, base_dir) 

    cmd = 'svn list "%s"' % url 
    pipe = os.popen(cmd, 'r') 
    listing = pipe.read().splitlines() 
    pipe.close() 

    dir_list = [] 
    for node in listing: 
     if node.endswith('/'): 
      dir_list.append(node) 

    for node in dir_list: 
     #externs_data.extend(analyzePath(url + node, base_dir + node)) 
     thread_pool.add_task(processDir, url+node, base_dir+node) 

def analyzePath(url, base_dir = ''): 
    thread_pool.add_task(processDir, url, base_dir) 
    thread_pool.wait_completion() 


analyzePath("http://url/to/repository") 
print externs_dict 
+0

ThreadPool依赖关系可以在这里找到https://gist.github.com/metal3d/5075460 – ceilfors 2015-04-22 10:57:27

+0

我在线程中执行时遇到'os.popen()'问题。它只是静静地死去。我放弃了在一个线程中运行它,并从该脚本中删除了所有的线程部分。虽然我在这里牺牲了速度,但与'propget -R'相比,这个脚本更可靠,当存储库太大时,它只是静静地死去。 – ceilfors 2015-04-22 10:58:47

+1

不使用ThreadPool https:// gist的版本。github.com/ceilfors/741d8152106a310dd454 – ceilfors 2015-05-16 09:48:50

25

正如您所提到的那样,它会消耗网络带宽。但是,如果您有权访问驻留这些存储库的服务器,则可以通过file://协议运行它。事实证明,速度更快,而不是网络消耗。

svn propget svn:externals -R file:///path/to/repo/Branch

另外,如果你已经制定了整个工作拷贝,你也可以你的厕所内运行。

svn propget svn:externals -R /path/to/WC 

希望它可以帮助您更快地达到结果!

+1

伟大的解决方案!节省了很多时间! – donV 2013-09-02 16:25:35

+1

不幸的是,我们的构建系统没有直接访问SVN服务器,所以这不会让我们超越这个问题,但是我提高了你的答案,因为如果你有这样的访问权限,它确实是一个很好的解决方案。 – NeoSkye 2014-02-18 23:56:05

+0

我使用的是SVN Server 1.4.6,使用'file://'会在几秒后给我*中止*。类似于http://svn.haxx.se/users/archive-2007-04/0500.shtml – ceilfors 2015-04-16 16:03:07

0

不理想的解决方案(可能有副作用),而不是回答你的问题,但

你可以重写所有外部定义,并添加(重写)在一个共同的,知道的地方 - 这样你就会消除递归在PG变化后

0

如果你不介意使用Python和pysvn库,这里是我使用SVN的外部一个完整的命令行程序:

""" 
@file 
@brief SVN externals utilities. 
@author Lukasz Matecki 
""" 
import sys 
import os 
import pysvn 
import argparse 

class External(object): 

    def __init__(self, parent, remote_loc, local_loc, revision): 
     self.parent = parent 
     self.remote_loc = remote_loc 
     self.local_loc = local_loc 
     self.revision = revision 

    def __str__(self): 
     if self.revision.kind == pysvn.opt_revision_kind.number: 
      return """\ 
Parent:  {0} 
Source:  {1}@{2} 
Local name: {3}""".format(self.parent, self.remote_loc, self.revision.number, self.local_loc) 
     else: 
      return """\ 
Parent:  {0} 
Source:  {1} 
Local name: {2}""".format(self.parent, self.remote_loc, self.local_loc) 


def find_externals(client, repo_path, external_path=None): 
    """ 
    @brief Find SVN externals. 
    @param client (pysvn.Client) The client to use. 
    @param repo_path (str) The repository path to analyze. 
    @param external_path (str) The URL of the external to find; if omitted, all externals will be searched. 
    @returns [External] The list of externals descriptors or empty list if none found. 
    """ 
    repo_root = client.root_url_from_path(repo_path) 

    def parse(ext_prop): 
     for parent in ext_prop: 
      external = ext_prop[parent] 
      for line in external.splitlines(): 
       path, name = line.split() 
       path = path.replace("^", repo_root) 
       parts = path.split("@") 
       if len(parts) > 1: 
        url = parts[0] 
        rev = pysvn.Revision(pysvn.opt_revision_kind.number, int(parts[1])) 
       else: 
        url = parts[0] 
        rev = pysvn.Revision(pysvn.opt_revision_kind.head) 
       retval = External(parent, url, name, rev) 
       if external_path and not external_path == url: 
        continue 
       else: 
        yield retval 

    for entry in client.ls(repo_path, recurse=True): 
     if entry["kind"] == pysvn.node_kind.dir and entry["has_props"] == True: 
      externals = client.propget("svn:externals", entry["name"]) 
      if externals: 
       for e in parse(externals): 
        yield e 


def check_externals(client, externals_list): 
    for i, e in enumerate(externals_list): 
     url = e.remote_loc 
     rev = e.revision 
     try: 
      info = client.info2(url, revision=rev, recurse=False) 
      props = info[0][1] 
      url = props.URL 
      print("[{0}] Existing:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()]))) 
     except: 
      print("[{0}] Not found:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()]))) 

def main(cmdargs): 
    parser = argparse.ArgumentParser(description="SVN externals processing.", 
            formatter_class=argparse.RawDescriptionHelpFormatter, 
            prefix_chars='-+') 

    SUPPORTED_COMMANDS = ("check", "references") 

    parser.add_argument(
     "action", 
     type=str, 
     default="check", 
     choices=SUPPORTED_COMMANDS, 
     help="""\ 
the operation to execute: 
    'check' to validate all externals in a given location; 
    'references' to print all references to a given location""") 

    parser.add_argument(
     "url", 
     type=str, 
     help="the URL to operate on") 

    parser.add_argument(
     "--repo", "-r", 
     dest="repo", 
     type=str, 
     default=None, 
     help="the repository (or path within) to perform the operation on, if omitted is inferred from url parameter") 

    args = parser.parse_args() 

    client = pysvn.Client() 

    if args.action == "check": 
     externals = find_externals(client, args.url) 
     check_externals(client, externals) 
    elif args.action == "references": 
     if args.repo: 
      repo_root = args.repo 
     else: 
      repo_root = client.root_url_from_path(args.url) 
     for i, e in enumerate(find_externals(client, repo_root, args.url)): 
      print("[{0}] Reference:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()]))) 

if __name__ == "__main__": 
    sys.exit(main(sys.argv)) 

这应该在这两个工作的Python 2和Python 3. Y您可以同时使用它像这样(去掉实际地址):

python svn_externals.py references https://~~~~~~~~~~~~~~/cmd_utils.py 
[1] Reference: 
    Parent:  https://~~~~~~~~~~~~~~/BEFORE_MK2/scripts/utils 
    Source:  https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py 
    Local name: cmd_utils.py 
[2] Reference: 
    Parent:  https://~~~~~~~~~~~~~~/VTB-1425_PCU/scripts/utils 
    Source:  https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py 
    Local name: cmd_utils.py 
[3] Reference: 
    Parent:  https://~~~~~~~~~~~~~~/scripts/utils 
    Source:  https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py 
    Local name: cmd_utils.py 

至于性能,这个工程相当快(虽然我的仓库是相当小)。你必须自己检查一下。