七仔的博客

七仔的博客GithubPages分博

0%

Phoenix查询优化(千万数据0.1秒查询)

Phoenix查询优化(千万数据0.1秒查询)Phoenix: 5.0,HBase:2.0,Zookeeper:3.4.12

Phoenix查询优化(千万数据0.1秒查询)

版本信息

Phoenix: 5.0
HBase:2.0
Zookeeper:3.4.12

优化过程

二级索引是关键,首先是创建索引,创建索引之前需要配置一些东西

在每一个RegionServer的hbase-site.xml中加入如下的属性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
<description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>
<property>
<name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
<description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>
<property>
<name>hbase.coprocessor.regionserver.classes</name>
<value>org.apache.hadoop.hbase.regionserver.LocalIndexMerger</value>
</property>

在每一个Master的hbase-site.xml中加入如下的属性:

1
2
3
4
5
6
7
8
<property>
<name>hbase.master.loadbalancer.class</name>
<value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
</property>

接下来看这个查询语句

1
2
3
4
SELECT MAX(create_time) AS "createTime",student_id AS "studentID"
FROM bracelet_info
WHERE student_id = '111' and create_time >= 1575388800000
group by student_id;

这条查询语句需要两个筛选,创建一个对应的组合二级索引:

1
create index idx_bracelet_info_student_id_time on bracelet_info(student_id,create_time desc);

接下来查看语句运行效率,这里增加了一个Hint,略微增加速度

1
2
3
4
5
6
7
8
9
10
11
12
13
0: jdbc:phoenix:************:****> SELECT /*+ INDEX(bracelet_info idx_bracelet_info_student_id_time) */
. . . . . . . . . . . . . . . . .> MAX(create_time) AS "createTime"
. . . . . . . . . . . . . . . . .> ,student_id AS "studentID"
. . . . . . . . . . . . . . . . .> FROM bracelet_info
. . . . . . . . . . . . . . . . .> WHERE student_id = '111'
. . . . . . . . . . . . . . . . .> and create_time >= 1575388800000
. . . . . . . . . . . . . . . . .> group by student_id;
+----------------+------------+
| createTime | studentID |
+----------------+------------+
| 1575428753262 | 111 |
+----------------+------------+
1 row selected (0.098 seconds)

千万级数据下查询效果很不错

这里附加一条删除二级索引的语句

1
drop index idx_bracelet_info_student_id_time ON bracelet_info;

此为博主副博客,留言请去主博客,转载请注明出处:https://www.baby7blog.com/myBlog/53.html

欢迎关注我的其它发布渠道