Hive
用户指南 1
v1.0
目录
1.
HIVE结构 .................................................................................................................................. 5 1.1 HIVE架构 .................................................................................................................... 5 1.2 Hive 和 Hadoop 关系 ................................................................................................ 6 1.3 Hive 和普通关系数据库的异同 ................................................................................. 7 1.4 HIVE元数据库 .............................................................................................................. 8
1.4.1 DERBY ................................................................................................................ 8 1.4.2 Mysql ................................................................................................................. 9 1.5 HIVE的数据存储 ...................................................................................................... 10 1.6 其它HIVE操作 ......................................................................................................... 10 HIVE 基本操作 ....................................................................................................................... 11 2.1 create table .................................................................................................................. 11 2.1.1 总述 ................................................................................................................. 11
2.1.2 语法 ................................................................................................................. 11 2.1.3 基本例子 ......................................................................................................... 13 2.1.4 创建分区 ......................................................................................................... 14 2.1.5 其它例子 ......................................................................................................... 15 2.2 Alter Table .................................................................................................................. 16
2.2.1 Add Partitions ............................................................................................. 16 2.2.2 Drop Partitions ........................................................................................... 16 2.2.3 Rename Table ................................................................................................. 17 2.2.4 Change Column ............................................................................................... 17 2.2.5 Add/Replace Columns ................................................................................... 17 2.3 Create View................................................................................................................. 18 2.4 Show ........................................................................................................................... 18 2.5 Load ............................................................................................................................ 18 2.6 Insert ........................................................................................................................... 20
2.6.1 Inserting data into Hive Tables from queries ................................. 20 2.6.2 Writing data into filesystem from queries ....................................... 21 2.7 Cli ................................................................................................................................ 22
2.7.1 Hive Command line Options ....................................................................... 22 2.7.2 Hive interactive Shell Command ............................................................. 23 2.7.3 Hive Resources ............................................................................................. 24 2.7.4 调用python、shell等语言 ......................................................................... 25 2.8 DROP ............................................................................................................................. 26 2.9 其它 ............................................................................................................................. 26
2.9.1 Limit ............................................................................................................... 26 2.9.2 Top k ............................................................................................................... 26 2.9.3 REGEX Column Specification ..................................................................... 27 Hive Select ............................................................................................................................... 27 3.1 Group By ..................................................................................................................... 27 3.2 Order /Sort By ......................................................................................................... 28 Hive Join .................................................................................................................................. 28
2
2.
3.
4.
5. 6.
HIVE参数设置 ........................................................................................................................ 31 HIVE UDF ................................................................................................................................. 33 6.1 基本函数 ..................................................................................................................... 33
6.1.1 关系操作符 ..................................................................................................... 33 6.1.2 代数操作符 ..................................................................................................... 34 6.1.3 逻辑操作符 ..................................................................................................... 35 6.1.4 复杂类型操作符 ............................................................................................. 35 6.1.5 内建函数 ......................................................................................................... 36 6.1.6 数学函数 ......................................................................................................... 36 6.1.7 集合函数 ......................................................................................................... 36 6.1.8 类型转换 ......................................................................................................... 36 6.1.9 日期函数 ......................................................................................................... 36 6.1.10 条件函数 ......................................................................................................... 37 6.1.11 字符串函数 ..................................................................................................... 37 6.2 UDTF .......................................................................................................................... 43
6.2.1 Explode .......................................................................................................... 44 HIVE 的MAP/REDUCE ............................................................................................................ 45 7.1 JOIN ............................................................................................................................. 45 7.2 GROUP BY .................................................................................................................... 46 7.3 DISTINCT ...................................................................................................................... 46 使用HIVE注意点 ................................................................................................................... 47 8.1 字符集 ......................................................................................................................... 47 8.2 压缩 ............................................................................................................................. 47 8.3 count(distinct) ....................................................................................................... 47 8.4 JOIN ............................................................................................................................. 47 8.5 DML操作 ...................................................................................................................... 48 8.6 HAVING ......................................................................................................................... 48 8.7 子查询 ......................................................................................................................... 48 8.8 Join中处理null值的语义区别 .............................................................................. 48 优化与技巧 ............................................................................................................................. 51 9.1 全排序 ......................................................................................................................... 53
9.1.1 例1 .................................................................................................................. 53 9.1.2 例2 .................................................................................................................. 56 9.2 怎样做笛卡尔积 ......................................................................................................... 59 9.3 怎样写exist/in子句 ............................................................................................... 60 9.4 怎样决定reducer个数 ............................................................................................. 60 9.5 合并MapReduce操作 ................................................................................................. 61 9.6 Bucket 与 sampling ................................................................................................... 62 9.7 Partition ....................................................................................................................... 62 9.8 JOIN ............................................................................................................................. 63
9.8.1 JOIN原则 ........................................................................................................ 63 9.8.2 Map Join .......................................................................................................... 64 9.8.3 大表Join的数据偏斜 .................................................................................... 66 9.9 合并小文件 ................................................................................................................. 67
3
7.
8.
9.
9.10 10.
Group By ...................................................................................................................... 67 HIVE FAQ: ..................................................................................................................... 68
4
1. HIVE结构
Hive 是建立在 Hadoop 上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。Hive 定义了简单的类 SQL 查询语言,称为 QL,它允许熟悉 SQL 的用户查询数据。同时,这个语言也允许熟悉 MapReduce 开发者的开发自定义的 mapper 和 reducer 来处理内建的 mapper 和 reducer 无法完成的复杂的分析工作。
1.1 HIVE架构
Hive 的结构可以分为以下几部分:
? 用户接口:包括 CLI, Client, WUI
? 元数据存储。通常是存储在关系数据库如 mysql, derby 中
5