在hive中并行运行查询

我用了一段时间的配置单元，但是，从来没有想过这件事。我正在尝试在hive -f sql-file中并行运行查询吗？任何人都知道如何做到这一点？由于在hive中并行运行查询

2013-02-11 user1653240

蜂巢将隐蔽的HiveQL查询到MapReduce工作和MapReduce作业可以并行基于集群的规模和配置的调度类型运行。因此，Hive查询将自动在Hadoop集群上并行运行。

2013-02-12 06:22:55

我打算在文件中并行运行独立查询。例如： 1）SELECT COUNT（1）FROM t1; 2）SELECT COUNT（1）FORM t2; 我想并行运行1）和2），以便1）不会阻塞2）。 – user1653240 2013-02-12 19:57:23

打开两个单独的Hive shell并执行HiveQL查询。 – 2013-02-13 07:11:41

这绝对是一种方式，有没有办法在SQL文件中查询非阻塞？例如，指定一些Hive标志... – user1653240 2013-02-14 07:05:13

在蜂巢的任何查询被编译为Map-Reduce和Hadoop的运行。 Map-reduce是一个并行处理框架，因此您的每个Hive查询都将并行运行和处理数据。

同样的问题，我问，但在某些不同的方式。有关更多详细信息，请参阅here。

来源

2013-02-12 09:35:45

@ user1653240为了在同一时间独立运行的查询，我在做什么是：

认沽查询到不同的文件，例如，select count(1) from t1 - > file1.sql，select count(1) from t2 - >文件2。 sql
使用nohup和&命令。采取file1.sql和file2.sql为例，运行：nohup hive -f file1.sql & nohup hive -f file2.sql，这将并行运行这两个查询。
如果你想在后台运行，只需添加一个&到底。对于例如：(nohup hive -f file1.sql & nohup hive -f file2.sql) &

来源

2016-01-18 20:54:47 legbird

蜂巢查询规划应该能够parallelise在特定情况下的东西。您需要虽然设置配置选项：如果您希望并行运行完全独立的查询从https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

采取

hive.exec.parallel 

Default Value: false 
Added In: Hive 0.5.0 

Whether to execute jobs in parallel. Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join. As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert.

，它可能是运行它作为独立的文件，其它单独的作业最好的选择建议。

来源

2016-01-19 15:44:01 LiMuBei

这是我选择了做，因为我无法找到一个方法来从蜂巢本身做到这一点。只需将文件名/数据库替换为您的情况。

# This file should have all the queries separated with semicolon ';' 
queries=`cat queries_file.sql` 
count=0 
while true; do 
    ((count++)) 
    query=`echo ${queries} | cut -d';' -f${count}` 
    if [ -z "${query}" ]; then 
     echo "Completed executing ${count} - 1 queries." 
     exit 
    fi 
    echo "${query}" 
    hive --database "your_db" -e "${query};" & 

    # This is optional. If you want to give some gap, say after every 5 
    # concurrent queries, use this. Or remove next 4 lines. 
    mod=`expr ${count} % 5` 
    if [ ${mod} -eq 0 ]; then 
     sleep 30 
    fi 
done

来源

2016-07-28 13:24:26 PratPor

在hive中并行运行查询

回答

相关问题