5.6 多值子列语法

多值子列是HybridDB计算规格的独特特性,用于表示一个列(某个cell)有多个不确定的值,列中列的概念。一个表中可以定义一个或者多个多值列,多值列可以做筛选、分组、参与连接过滤等。举个例子:

1)建表

CREATE TABLE test1 (
    user_id BIGINT,
    city VARCHAR,
    tags MULTIVALUE delimiter_tokenizer ',' value_type 'varchar',
    brands MULTIVALUE delimiter_tokenizer ': ,' value_type 'varchar int'
)
DISTRIBUTE BY HASH (user_id)
ENGINE = 'CSTORE'
INDEX_ALL='Y';

> 说明:
  - tags是只有一个多值子列(通常所说的多值列),业务意义:每个人可以打多个标签
  - brands有两个多值子列,第一个子列类型为varchar,业务意义表示每个用户偏好的品牌,
  第二个子列类型为int,业务意义表示每个用户偏好品牌的偏好度(分数)。
  然后写入时的采用两种分隔符,首先用`:`然后是`,`来切分每一个cell数据

2) 写入数据

insert into test1 values(1, 'HZ', 'A,B,C', 'X:80,Y:70');
insert into test1 values(2, 'BJ', 'B,D', 'X:50,Z:90,W:60');
insert into test1 values(3, 'SH', 'A,C,D,F', 'X:90');


mysql> select * from test1 order by user_id;
+---------+------+---------+----------------+
| user_id | city | tags    | brands         |
+---------+------+---------+----------------+
|       1 | HZ   | A,B,C   | X:80,Y:70      |
|       2 | BJ   | B,D     | X:50,Z:90,W:60 |
|       3 | SH   | A,C,D,F | X:90           |
+---------+------+---------+----------------+
3 rows in set (0.01 sec)

3) 多值子列查询

过滤查询

查询包含A、B标签的用户数目

mysql> select count(*) from test1 where ref(tags,0) in ('A', 'B');
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.10 sec)


查询同时包含A和B标签的用户数目

mysql> select * from test1 where ref(tags,0) in ('A') and ref(tags,0) in ('B');
+---------+------+-------+-----------+
| user_id | city | tags  | brands    |
+---------+------+-------+-----------+
|       1 | HZ   | A,B,C | X:80,Y:70 |
+---------+------+-------+-----------+
1 row in set (0.01 sec)


查询偏好品牌包含X和偏好度在80以上的用户

mysql> select * from test1 where ref(brands, 0) in ('X') and ref(brands, 1) > 80;
+---------+------+---------+----------------+
| user_id | city | tags    | brands         |
+---------+------+---------+----------------+
|       2 | BJ   | B,D     | X:50,Z:90,W:60 |
|       3 | SH   | A,C,D,F | X:90           |
+---------+------+---------+----------------+
2 rows in set (0.01 sec)

分组查询


不在杭州且标签为A或者B的用户在A、B标签下的人数

mysql> select ref(tags,0), count(*) from test1
where ref(tags,0) in ('A', 'B') and city != 'HZ' group by ref(tags,0);
+--------------+----------+
| ref(tags, 0) | COUNT(*) |
+--------------+----------+
| A            |        1 |
| B            |        1 |
+--------------+----------+
2 rows in set (0.05 sec)

看A或者B标签和其对应城市的联合分组

mysql> select ref(tags,0), city, count(*) from test1
where ref(tags,0) in ('A', 'B') group by ref(tags,0), city;
+--------------+----------+
| ref(tags, 0) | COUNT(*) |
+--------------+----------+
| B            | BJ       |
| B            | HZ       |
| A            | SH       |
| A            | HZ       |
+--------------+----------+
4 rows in set (0.02 sec)


多个多值子列的联合分组

mysql> select ref(tags,0), ref(brands,0), count(*) from test1
where ref(tags,0) in ('A', 'B') and ref(brands,0) in ('X')  
group by ref(tags,0), ref(brands,0) ;
+--------------+----------------+----------+
| ref(tags, 0) | ref(brands, 0) | COUNT(*) |
+--------------+----------------+----------+
| A            | X              |        2 |
| B            | X              |        2 |
+--------------+----------------+----------+
2 rows in set (0.02 sec)

连接查询

mysql> select count(*) from test1 join test
on test1.user_id = test.id where ref(tags,0) in ('A', 'B') ;
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.16 sec)


mysql> select sum(test.int_test) from test1 join test
on test1.user_id = test.id where ref(tags,0) in ('A', 'B') ;
+--------------------+
| SUM(test.int_test) |
+--------------------+
|               -207 |
+--------------------+
1 row in set (0.03 sec)


mysql> select test.id, test1.* from test1 join test
on test1.user_id = test.id
where ref(tags,0) in ('A', 'B') and ref(brands, 0) in ('Y');
+----+---------+------+-------+-----------+
| id | user_id | city | tags  | brands    |
+----+---------+------+-------+-----------+
|  1 |       1 | HZ   | A,B,C | X:80,Y:70 |
+----+---------+------+-------+-----------+
1 row in set (0.02 sec)

4)注意事项

  • 多值列的group by一定要带where条件,否则引擎底层要穷举所有的value(默认只穷举1024个value)
  • 多个子列之间的组合筛选是cell内的or关系。

results matching ""

    No results matching ""