6.4 异步提交导入导出任务

1. 异步提交

异步提交的语法:

submit job insert overwrite into xxx select ...

该命令会返回一个task_id,例如:

mysql> submit job insert overwrite into test select * from test_external_table;
+---------------------------------------+
| job_id                               |
+---------------------------------------+
| 2017112122202917203100908203303000715 |
+---------------------------------------+
1 row in set (1.77 sec)

2. 查询状态

语法:

show job status where job=‘xxx’

例如:

mysql> show job status where job='2017112122202917203100908203303000715';
+---------------------------------------+-------------+---------+----------+-----------------------+-----------------------+--------------------------------------+
| job_id                              | schema_name | status  | fail_msg | create_time           | update_time           | definition                           |
+---------------------------------------+-------------+---------+----------+-----------------------+-----------------------+--------------------------------------+
| 2017112122202917203100908203303000715 | test    | RUNNING | NULL     | 2017-11-21 22:20:31.0 | 2017-11-21 22:20:40.0 |  insert into test select * from test |
+---------------------------------------+-------------+---------+----------+-----------------------+-----------------------+--------------------------------------+
1 row in set (0.35 sec)

任务状态:

INIT -> SUBMITTED -> RUNNING -> FINISH | FAILED

入队列 -> 被调度起来提交 -> 后台开始执行 -> 成功或失败

3. 任务终止

语法:

cancel job 'xxx'

例如:

mysql> cancel job '2017112122202917203100908203303000715';
Query OK, 1 row affected (0.02 sec)

说明:

  • 未调度起来的任务会被移除队列
  • 正在跑的任务会被终止,已导入数据会被回滚
  • 已完成(失败或成功)的任务也会被移除

4. 如何在D2 / DataWorks定时调度

注意:

  • 本章假设您已经对DataWorks的基本概念有一定了解,查看DataWorks的文档
  • 本章假设您对shell的语法有一定基本了解,能看到下面的shell
  • 本节介绍如何在DataWorks上如何配置shell节点做数据同步

(1). 原理介绍

很简单,就是在shell里面把上面提到的异步提交查询状态自动化起来了。shell里面直接使用mysql客户端执行语句,所以前提是shell运行的节点必须能够访问HybridDB。

目前只能支持HybridDB是经典网络公有网络(外网)

(2). 配置白名单

  • 将DataWorks的IP网段加入HybridDB白名单里面,目的是让DataWorks可以访问HybridDB。DataWorks的IP网段列表参见

  • 配置DataWorks沙箱白名单,进入DataWorks控制台,点击项目配置,在弹出框中找到高级设置点击更多设置,会进入项目配置页面,如下图:

    image-20181121095105424

  • 点击添加,请填写你要再shell里面访问的HybridDB的域名和端口,请注意,这里的域名只能支持经典网络公有网络(外网)

(3). 创建shell任务

  • 在您的业务流程里面创建shell任务,如下图

    image-20181121100249574

  • 将下面脚本粘贴到shell框中,粘贴之前请仔细阅读该脚本,里面的逻辑就是把上面提到的异步提交查询状态自动化串起来了

#!/usr/bin/env bash

host="xxx.petadata.rds.aliyuncs.com" ## your instance dns
port="3303"  ## your instance vport
user="cstore"
password="cstore"
database="testdb"
source_table="odps_test_external_table"
target_table="cstore_test"
timeout=86400   #24个小时,超时


sql="submit job insert overwrite into ${target_table} select * from ${source_table}"
echo "exec SQL:$sql"

mysql -h"${host}" -P"${port}" -u"${user}" -p"${password}" "${database}" -e "${sql}" | egrep "201[0-9]+" > import_${target_table}_jobid.txt

echo 执行返回的jobid: `cat import_${target_table}_jobid.txt`

jobid=`cat import_${target_table}_jobid.txt`

check_finish_sql="show job status where job = '${jobid}'"
echo "Check Finish SQL:$check_finish_sql"

mysql -h"${host}" -P"${port}" -u"${user}" -p"${password}" "${database}" -e "${check_finish_sql}" | egrep "${jobid}" | awk '{print $3}' > import_${target_table}_result.txt

result=`cat import_${target_table}_result.txt`
begin_time=$(date "+%s")
echo "begin_time = "$begin_time

while true
do
    if [[ "$result" == "FINISH" ]]; then
        break;
    elif [[ "$result" == "FAILED" ]]; then
        echo "$jobid is failed, so exit"
        exit -1
    else
        echo "$jobid current status is $result"
    fi

    end_time=$(date "+%s")
    cost=$(($end_time - $begin_time))

    if [[ "$cost" -gt "$timeout" ]]
    then
        echo "$jobid has cost $cost second >= $timeout , so exit"
        exit -1
    else
        echo "$jobid is running using $cost seconds"
        sleep 30
    fi

    mysql -h"${host}" -P"${port}" -u"${user}" -p"${password}" "${database}" -e "${check_finish_sql}" | egrep "${jobid}" | awk '{print $3}' > import_${target_table}_result.txt
    result=`cat import_${target_table}_result.txt`

done

end_time=$(date "+%s")
cost=$(($end_time - $begin_time))
echo "$jobid has finished, using $cost seconds"
  • 点击运行,查看结果

    image-20181121102812160

  • 如果想把这个shell任务配置成周期任务,点击右边侧栏调度配置,自行配置调度周期,如下图:

    image-20181121102944495

results matching ""

    No results matching ""