2012-03-07 108 views
1

我有一个包含10个字段和超过3000万行的单个数据库表。这对于表格存储非常理想,因为我只需要在一列上进行搜索,而返回其余列。将大型SQL数据库上传到Azure表存储

我已经编写了一个程序,它从数据库中取出行并上传到表存储器,但按照这个速度它将至少需要9或10天才能完成。

有没有一种快捷的方式将完整的表格上传到azure表格存储?

回答

0

云存储工作室,这是从Cerebrata一个commmercial封装,内置了这个功能。我相信他们会多线程上传,虽然我还没有”特别检查。它在互联网上仍然需要一段时间。

可能最快的事情是将原始数据上传到BLOB存储,然后编写一个WorkerRole,它可以运行在同一个数据中心,并读取blob并将其写入表存储。有了大量的线程和良好的分区策略,您可以非常快速地完成任务。但是,实施这个计划的时间可能会比通过“慢”方式实现的节省更多。

1

您可以从数据中心获得最佳性能。如果你真的拥有这么多的数据,那么将所有数据压缩为blob,将blob上传到存储中,然后让一个角色在同一个数据中心下载,解压缩并插入内容是很有必要的。这比远程尝试要快几个数量级。

如果您还可以按分区键对数据进行排序,则可以一次批量插入数据(100个条目或4MB值)。您也可以批量并行处理它。

我不知道任何事情会为你开箱,所以你现在可能不得不自己写这个。

0

这是我写的存储过程,可以让您加载非常大的数据集。你必须一次做一张桌子,但有一些注意事项,但是这个我在10分钟内上传了7GB或大约1000万行。

代码项目的文章中更多信息我在这里创造: http://www.codeproject.com/Articles/773469/Automating-Upload-of-Large-Datasets-to-SQL-Azure

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 

    Document Title: usp_BulkAzureInsert.sql 
    Script Purpose: Dynamically insert large datasets into Azure 

    Script Notes: 1) This assumes the current user has write access to the C drive for file copy 
         If the current windows user does not have access override by hardcoding in an export folder location 
         Leave ExportFolder as 'NULL' for C:\User\CurrentUser\AzureExports 
        2) User must have permission to create permanent tables (dropped at the end of the script but used for staging) 

    Parameters: @DB_Schema_Table = DatabaseName.Schema.TableName (3 part no server) 
       @AzureTableName = DatabaseName.Schema.TableName (3 part no server) 
       @AzureServer  = Azure Server location ending in .net (no HTTP) 
       @AzureClusteredIDX = Azure requires each table to have a clustered index. Comma delimited index definition. 
       @AzureUserName  = Azure User Name 
       @AzurePassword  = Azure Password 
       @ExportFolder  = 'NULL' defaults to C:\User\CurrentUser\AzureExports - Use this to override 
       @CleanupDatFiles = Set to 1 to delete the directory and files created in the upload process 
       @ViewOutput  = Set to 1 to view insert information during upload    

     --Sample Execution 
     EXECUTE usp_BulkAzureInsert 
       @DB_Schema_Table = 'MyDatabase.dbo.Customers', 
       @AzureTableName = 'AZ001.dbo.Customers', 
       @AzureServer  = 'abcdef123.database.windows.net', 
       @AzureClusteredIDX = 'CustomerID, FirstName, LastName', 
       @AzureUserName  = 'AzureUserName', 
       @AzurePassword  = 'MyPassword123', 
       @ExportFolder  = 'NULL', 
       @CleanupDatFiles = 1, 
       @ViewOutput  = 1 

-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    IF OBJECT_ID (N'usp_BulkAzureInsert', N'P') IS NOT NULL 
    DROP PROCEDURE usp_BulkAzureInsert; 
    GO 

    CREATE PROCEDURE usp_BulkAzureInsert     
      @DB_Schema_Table NVARCHAR(100), 
      @AzureTableName  NVARCHAR(100), 
      @AzureClusteredIDX NVARCHAR(100), 
      @AzureServer  NVARCHAR(100), 
      @AzureUserName  NVARCHAR(100), 
      @AzurePassword  NVARCHAR(100), 
      @ExportFolder  NVARCHAR(100), 
      @CleanupDatFiles BIT, 
      @ViewOutput   BIT 
    AS 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Start with Error Checks 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    IF (SELECT CONVERT(INT, ISNULL(value, value_in_use)) FROM sys.configurations WHERE name = N'xp_cmdshell') = 0 
     BEGIN 
      RAISERROR ('ERROR: xp_cmdshell is not enable on this server/database',16,1) 
      RETURN 
     END 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Declare and Set Script Variables 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    IF (@ViewOutput = 1) 
     BEGIN 
      SET NOCOUNT ON; 
     END 

    DECLARE @CMD NVARCHAR(1000), @SQL NVARCHAR(MAX), @i TINYINT = 1, @NTILE VARCHAR(10), @NTILE_Value TINYINT, @TempTableName VARCHAR(1000), 
      @ColumnNames NVARCHAR(MAX), @TableName VARCHAR(100), @Server NVARCHAR(100) 

    --Set the export folder to the default location if the override was not used 
    IF @ExportFolder = 'NULL' 
     BEGIN 
      SET @ExportFolder = N'C:\Users\' + 
           CAST(REVERSE(LEFT(REVERSE(SYSTEM_USER), CHARINDEX('\', REVERSE(SYSTEM_USER))-1)) AS VARCHAR(100)) + 
           N'\AzureExports'; 
     END; 

     --Set a permanent obejct name based on 
     SET @TempTableName = (LEFT(@DB_Schema_Table, CHARINDEX('.',@DB_Schema_Table)-1) + 
           '.dbo.TempAzure' + 
           CAST(REVERSE(LEFT(REVERSE(@DB_Schema_Table), CHARINDEX('.', REVERSE(@DB_Schema_Table))-1)) AS VARCHAR(100))) 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Calculate the amount of files to split the dataset into (No more than 250,000 lines per file) 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

     SET @SQL = ' SELECT @NTILE = CEILING((CAST(COUNT(*) AS FLOAT)/250000)) FROM ' + @DB_Schema_Table +'; ';  

     EXECUTE sp_executesql @SQL, N'@NTILE VARCHAR(100) OUTPUT', @NTILE = @NTILE OUTPUT;  
     SET @NTILE_Value = CAST(@NTILE AS TINYINT); 

     SET @TableName = CAST(REVERSE(LEFT(REVERSE(@DB_Schema_Table), CHARINDEX('.', REVERSE(@DB_Schema_Table))-1)) AS VARCHAR(100)); 
     SET @Server = (SELECT @@SERVERNAME); 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Create a folder to stage the DAT files in 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    --Remove the directory if it already exists and was not previously deleted 
    SET @CMD = N'rmDir /Q /S ' + @ExportFolder; 
    EXECUTE master.dbo.xp_cmdshell @CMD, NO_OUTPUT; 

    --Create a folder to hold the export files 
    SET @CMD = N' mkDir ' + @ExportFolder; 
    EXECUTE master.dbo.xp_cmdshell @CMD, NO_OUTPUT; 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Create a staging table that breaks the file into sections based on the NTILE_Value 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

     --Find the names of the columns in the table 
     IF OBJECT_ID('tempdb.dbo.#ColumnNames') IS NOT NULL 
     DROP TABLE #ColumnNames 
     CREATE TABLE #ColumnNames 
     (
      ColumnOrder INTEGER IDENTITY(1,1) NOT NULL, 
      ColumnName NVARCHAR(100) NOT NULL 
     ); 
      INSERT INTO #ColumnNames 
      SELECT COLUMN_NAME 
      FROM information_schema.columns 
      WHERE table_name = @TableName 
      ORDER BY ordinal_position 

     --Create a list of the column names 
     SELECT @ColumnNames = COALESCE(@ColumnNames + ', ', '') + CAST(ColumnName AS VARCHAR(MAX)) 
           FROM #ColumnNames; 

     --Split the results by the NTILE_Value 
     DECLARE @Column1 NVARCHAR(100) = (SELECT ColumnName FROM #ColumnNames WHERE ColumnOrder = 1); 

     SET @SQL = ' IF OBJECT_ID(''' + @TempTableName + ''') IS NOT NULL 
        DROP TABLE ' + @TempTableName + ' 

        SELECT ' + @ColumnNames + ', ' + ' 
          NTILE(' + @NTILE + ') OVER(ORDER BY ' + @Column1 + ') AS NTILE_Value 
        INTO ' + @TempTableName + ' 
        FROM ' + @DB_Schema_Table 
     EXECUTE (@SQL); 

     --Now split the dataset into equal sizes creating a DAT file for each batch 
     WHILE @i <= @NTILE_Value 
      BEGIN 

       SET @SQL = 'IF OBJECT_ID(''' + @TempTableName + 'DatFile'') IS NOT NULL 
          DROP TABLE ' + @TempTableName + 'DatFile 

          SELECT ' + @ColumnNames + ' 
          INTO ' + @TempTableName + 'DatFile 
          FROM ' + @TempTableName + ' 
          WHERE NTILE_Value = ' + CAST(@i AS VARCHAR(2)) + ' 

          CREATE CLUSTERED INDEX IDX_TempAzureData ON ' + @TempTableName + 'DatFile (' + @AzureClusteredIDX + ')'; 
       EXECUTE (@SQL); 

       SET @CMD = N'bcp ' + @TempTableName + 'DatFile out ' + 
        @ExportFolder + N'\' + @TableName + 'DatFile' + 
        CAST(@i AS NVARCHAR(3)) + '.dat -S ' + @Server + ' -T -n -q'; 

       IF (@ViewOutput = 1) 
       BEGIN 
        EXECUTE master.dbo.xp_cmdshell @CMD; 
       END 
       ELSE 
        EXECUTE master.dbo.xp_cmdshell @CMD, NO_OUTPUT; 

       SET @i += 1; 
      END 

     --Clean up the temp tables 
     SET @SQL = ' DROP TABLE ' + @TempTableName; 
     EXECUTE (@SQL); 

     SET @SQL = ' DROP TABLE ' + @TempTableName + 'DatFile' ; 
     EXECUTE (@SQL); 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Insert the data into the AzureDB 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

     --Reset the Variable 
     SET @i = 1; 

     --Move each batch file into the DB 
     WHILE @i <= @NTILE_Value 
      BEGIN    
       SET @CMD = N'Bcp ' + @AzureTableName + ' in ' + 
          @ExportFolder + N'\' + @TableName + 'DatFile' + CAST(@i AS NVARCHAR(2)) + '.dat -n -U ' + 
          @AzureUserName + '@' + LEFT(@AzureServer, CHARINDEX('.',@AzureServer)-1) + 
          N' -S tcp:' + @AzureServer + 
          N' -P ' + @AzurePassword;  

       IF (@ViewOutput = 1) 
       BEGIN 
        EXECUTE master.dbo.xp_cmdshell @CMD; 
       END 
       ELSE 
        EXECUTE master.dbo.xp_cmdshell @CMD, NO_OUTPUT; 

       SET @i += 1; 
      END 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Cleanup the finished tables 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    IF (@CleanupDatFiles = 1) 
     BEGIN 
      SET @CMD = N'rmDir /Q /S ' + @ExportFolder; 
      EXECUTE master.dbo.xp_cmdshell @CMD, NO_OUTPUT; 
     END 

/*--------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Script End 
-----------------------------------------------------------------------------------------------------------------------------------------------------------*/ 

    IF (@ViewOutput = 1) 
     BEGIN 
      SET NOCOUNT OFF; 
     END 
+0

一大堆的T-SQL,但它没有正确回答这个问题,因为它处理上传到Azure SQL,而问题是问如何上传到Azure表存储(NoSQL)。虽然上传到Azure SQL的很好的过程,或者我错过了什么? – 2016-11-24 06:13:59

相关问题