先电大数据平台操作手册-iandian-bigdata-v2.1

由天下分享时间：2025/3/23 20:19:07 加入收藏我要投稿点赞

# su hive $ hive

$ logout 退出hive用户

6.1.2 hive 命令参数

usage: hive

-d，--define Variable subsitution to apply to hive commands. . -d A=B or --define A=B --database Specify the database to use -e SQL from command line -f SQL from files

-H，--help Print help information --hiveconf Use value for given property

--hivevar Variable subsitution to apply to hive commands. . --hivevar A=B -i Initialization SQL file

-S，--silent Silent mode in interactive shell -v，--verbose Verbose mode (echo executed SQL to the console) 1. hive交互模式

hive> show tables; #查看所有表名

hive> show tables 'ad*' #查看以'ad'开头的表名

hive> set 命令 #设置变量与查看变量； hive> set -v #查看所有的变量 hive> set #查看变量 hive> set #设置变量

hive> dfs -ls #查看hadoop所有文件路径

hive> dfs -ls /user/hive/warehouse/ #查看hive所有文件 hive> dfs -ls /user/hive/warehouse/ptest #查看ptest文件 hive> source file #在client里执行一个hive脚本文件 hive> quit #退出交互式shell hive> exit #退出交互式shell hive> reset #重置配置为默认值

hive> !ls #从Hive shell执行一个shell命令 2. 操作及函数

查看函数：

hive> show functions; 正则查看函数名：

show functions 'xpath.*'; 查看具体函数内容：

describe function xpath; | desc function xpath; 3. 字段类型

Hive支持基本数据类型和复杂类型，基本数据类型主要有数值类型(INT、FLOAT、DOUBLE)、布尔型和字符串，复杂类型有三种:ARRAY、MAP 和 STRUCT。 4. 基本数据类型

TINYINT: 1个字节 SMALLINT: 2个字节 INT: 4个字节 BIGINT: 8个字节

BOOLEAN: TRUE/FALSE FLOAT: 4个字节，单精度浮点型

DOUBLE: 8个字节，双精度浮点型STRING 字符串 5. 复杂数据类型

ARRAY: 有序字段 MAP: 无序字段

STRUCT: 一组命名的字段

6.1.3 表类型

hive表大致分为普通表、外部表、分区表三种。 1. 普通表

创建表

hive> create table tb_person(id int, name string); 创建表并创建分区字段ds

hive> create table tb_stu(id int, name string) partitioned by(ds string); 查看分区

hive> show partitions tb_stu; 显示所有表 hive> show tables; 按正则表达式显示表， hive> show tables 'tb_*'; 表添加一列

hive> alter table tb_person add columns (new_col int);

添加一列并增加列字段注释

hive> alter table tb_stu add columns (new_col2 int comment 'a comment'); 更改表名

hive> alter table tb_stu rename to tb_stu; 删除表(hive只能删分区，不能删记录或列 ) hive> drop table tb_stu;

对于托管表，drop 操作会把元数据和数据文件删除掉，对于外部表，只是删除元数据。如果只要删除表中的数据，保留表名可以在 HDFS 上删除数据文件:

hive> dfs -rmr /user/hive/warehouse/mutill1/*

将本地/home/hadoop/ziliao/文件中的数据加载到表中，文件数据如下： 1 zhangsan 2 lisi 3 wangwu

将文件中的数据加载到表中

hive> load data local inpath '/home/hadoop/ziliao/' overwrite into table tb_person;

加载本地数据，同时给定分区信息

hive> load data local inpath '/home/hadoop/ziliao/' overwrite into table tb_stu partition (ds='2008-08-15');

备注：如果导入的数据在 HDFS 上，则不需要 local 关键字。托管表导入的数据文件可在数据仓库目录“user/hive/warehouse/”中看到。

查看数据

hive> dfs -ls /user/hive/warehouse/tb_stu hive> dfs -ls /user/hive/warehouse/tb_person 2. 外部表

external关键字可以让用户创建一个外部表，在建表的同时指定一个指向实际数据的路径(location)，hive创建内部表时，会将数据移动到数据仓库指向的路径；若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。

eg. 创建外部表：

hive> create external table tb_record(col1 string， col2 string) row format delimited fields terminated by '\\t' location '/user/hadoop/input';

这样表tb_record的数据就是的数据了。 3. 分区表

分区是表的部分列的集合，可以为频繁使用的数据建立分区，这样查找分区中的数据时就不需要扫描全表，这对于提高查找效率很有帮助。

创建分区：create table log(ts bigint，line string) partitioned by(name string); 插入分区：insert overwrite table log partition(name='xiapi') select id from userinfo where name='xiapi';

查看分区：show partitions log;

删除分区: alter table ptest drop partition (name='xiapi')

备注:通常情况下需要先预先创建好分区，然后才能使用该分区。还有分区列的值要转化为文件夹的存储路径，所以如果分区列的值中包含特殊值，如 '%'， ':'， '/'， '#'，它将会被使用%加上 2 字节的 ASCII 码进行转义。

6.1.4 sql操作及桶

1. 创建表

首先建立三张测试表:

userinfo表中有两列，以tab键分割，分别存储用户的id和名字name;

classinfo表中有两列，以tab键分割，分别存储课程老师teacher和课程名classname; choice表中有两列，以tab键分割，分别存储用户的userid和选课名称classname(类似中间表)。

创建测试表:

hive> create table userinfo(id int，name string) row format delimited fields terminated by '\\t';

hive> create table classinfo(teacher string，classname string) row format delimited fields terminated by '\\t';