Operations of hive internal tables and external tables

Table of Contents

Create table statement

type of data

Classification of tables

Classification

The difference between internal tables and external tables

Example

Basic operations of internal tables

Knowledge points

Example

Basic operations of external tables

Example

View/modify table

Knowledge points

Example

Default separator

Example

quick mapping table

Knowledge

Example

Data import and export

File data loading and importing

Upload files directly

window page upload

linux local put upload

loadload file

loadMove HDFS files

loadUpload Linux files

insertInsert data

insert overwrite data

File data export

Download files directly

web page download

?edit

get command download file

insert export data

insert export to hdfs

insert exportlinux

hive_shell command

hsql statement export

hsql script export


Table creation statement

create [external] table [if not exists] table name (field name field type, field name field type,…)

[partitioned by (partition field name partition field type)] –Fixed format of partition table

[clustered by (bucket field name) into number of buckets buckets] — bucket table fixed format, note: can be sorted [sorted by (sort field name asc|desc)]

[row format delimited fields terminated by ‘field delimiter’ ] — Custom field delimiter fixed format

[stored as textfile] –Default text format

[location ‘hadfs://domain name’:8020/user/hive/warehouse/library name.db/table name] –default is enough;

Note: 1. Be sure to add a semicolon at the end

2. The order of keywords is from top to bottom, left to right, otherwise an error will be reported

defaultDefault library storage path: /user/hive/warehouse
The default storage path of custom libraries in HDFS: /user/hive/warehouse/library name.db
The default storage path of custom tables in HDFS: /user/hive/warehouse/library name.db/table name
The default storage path of business data files in HDFS: /user/hive/warehouse/library name.db/table name/data file

data type

Basic data types:

Integer: int Decimal: double float String: string varchar (length) Date: date timestamp

Complex data types:

Collection: array Mapping: map Structure: struct Union: union

Table classification

category

There are several types of tables that can be created in Hive, namely:

Internal table: also called management table or managed table

Partition table (partition bucket table)

Bucket table

External table: also called unmanaged table or unmanaged table

Partition table (partition bucket table)

Bucket table

View table formatting information: desc formatted table name;
— Internal table type: MANAGED_TABLE
–External table type: EXTERNAL_TABLE

The difference between internal tables and external tables

Internal table: The internal table that is not modified by the external keyword is an ordinary table. Internal tables are also called management tables or managed tables.
Delete internal tables: directly delete metadata and the stored data itself
External table: What is modified by the external keyword is an external table, that is, a related table. It can also be called an unmanaged table or an unmanaged table.
Deleting external tables: only deletes metadata, not the stored data itself

example

--Create internal tables (also called management tables, managed tables)
create table inner_user1(
    id int,
    name string
);
-- Create external tables (also called unmanaged tables, unmanaged tables)
create external table outer_user1(
    id int,
    name string
);
-- View table structure
desc inner_user1;
desc outer_user1;
-- View table information
desc formatted inner_user1; -- Type of external table: MANAGED_TABLE
desc formatted outer_user1; -- type of external table: EXTERNAL_TABLE
-- Demonstration of deleting internal/external tables
--Delete internal tables
-- drop deleting the internal table will delete all data related to the table
insert into inner_user1 values (1,'Zhang San');
drop table inner_user1;
--drop Deleting the external table will only delete the metadata, and the business data in HDFS will be retained
insert into outer_user1 values (1,'Zhang San');
drop table outer_user2;
--After creating the table again, you can use location to associate the business data in the original HDFS
create external table outer_user2(
    id int,
    name string
) location '/user/hive/warehouse/hive2.db/outer_user1';
--verify the data
select * from outer_user2 limit 5; -- 1 Zhang San

Basic operations of internal tables

Knowledge Point

Create internal table: create table [if not exists] internal table name (field name field type, field name field type,,…) [row format delimited fields terminated by ‘field delimiter’]

Copy internal table:

Method 1: like copy create table table 1 like table 2

Method 2: copy as method create table table 1 as select statement

Delete internal table: drop table internal table name;
Note: The effect of deleting the internal table is that the metadata related to the table in MySQL is deleted, and the business data itself stored in HDFS is also deleted.

View table formatting information: desc formatted table name

Note: You can use truncate to clear the internal table data truncate table internal table name

Example

-- 1. Creation and deletion of internal tables
-- 1. Demonstrate the creation of internal tables
-- Table creation method 1
create table inner_user1(
    id int,
    name string
);
--Insert data
insert into inner_user1 values (1,'Zhang San');
-- Table creation method 2: Quickly copy the table structure
create table inner_user2 like inner_user1;
--Insert data
insert into inner_user2 values (1,'Zhang San');
-- Table creation method 3: Quickly copy tables and table data
create table inner_user3 as select * from inner_user1;

-- 2. Demonstrate viewing internal table structure details
-- Internal table type: MANAGED_TABLE
desc formatted inner_user1;
desc formatted inner_user2;
desc formatted inner_user3;

-- 3. Demonstrate deletion of internal tables
-- Delete internal table
drop table inner_user3; -- Metadata and HDFS business data are deleted
--Clear internal table data
truncate table inner_user2;
--Neither delete nor update can be used
delete from inner_user2; -- error report
update inner_user2 set name='李思' -- error report

Basic operations of external tables

Create external table: create external table [if not exists] External table name (field name field type, field name field type, …) [row format delimited fields terminated by ‘field delimiter’];

Copy the table: Method 1: Copy the table structure in the like method. Note: the as method cannot be used.

Delete external table: drop table external table name;
Note: The effect of deleting an external table is that the metadata in MySQL is deleted, but the business data itself stored in HDFS is retained.

View table formatting information: desc formatted table name; — External table type: EXTERNAL_TABLE

Note: External tables cannot use truncate to clear the data themselves

Example

-- 1. Creation and deletion of external tables
-- 1. Demonstrate the creation of external tables
-- Table creation method 1
create external table outer_user1(
    id int,
    name string
);
--Insert data
insert into outer_user1 values (1,'Zhang San');
-- Table creation method 2: Quickly copy the table structure
create table outer_user2 like outer_user1;
--Insert data
insert into inner_user2 values (1,'Zhang San');
-- Table creation method 3: Quickly copy tables and table data
create external table outer_user3 as select * from outer_user1; -- error

-- 2. Demonstrate viewing the detailed information of the external table structure
-- Internal table type: EXTERNAL_TABLE
desc formatted outer_user1;
desc formatted outer_user2;

-- 3. Demonstrate deletion of external tables
-- Delete external table
drop table outer_user2; -- metadata is deleted and HDFS business data is retained
--You cannot use truncate to clear external table data
truncate table outer_user1; -- error report
--Neither delete nor update can be used
delete from outer_user1; -- error
update outer_user1 set name='李思' -- error report

View/Modify Table

Knowledge points

View all tables: show tables;
View the table creation statement: show create table table name;
View table information: desc table name;
View table structure information: desc table name;
View table formatting information: desc formatted table name; Note: formatted can display detailed information

Modify table name: alter table old table name rename to new table name
Adding fields: alter table table name add columns (field name field type);
Field replacement: alter table table name replace columns (field name field type, …);
The field name and field type are modified at the same time: alter table table name change old field name new field name new field type;
Note: The string type cannot be directly changed to the numeric type.

Modify table path: alter table table name set location ‘storage path in hdfs’; Note: It is recommended to use the default path
location: When creating a table, the default path /user/hive/warehouse/library name.db/table name is not written. Of course, the path can also be specified directly when creating a table.

Modify table properties: alter table table name set tblproperties (‘property name’ = ‘property value’); Note: often used for internal and external table switching
Internal and external table type switching: External table attribute: ‘EXTERNAL’=’TRUE’ Internal table attribute: ‘EXTERNAL’=’FALSE’

Example

-- table view/modify operations
-- Verify whether the previous internal and external tables exist and have data. If you have not created them yourself, if you have used them directly,
select * from inner_user1 limit 1;
select * from outer_user1 limit 1;
-- 1. Table viewing operation
-- View all tables
show tables;
-- View table creation statements
show create table inner_user1;
show create table outer_user1;
-- View table basic structure
desc inner_user1;
desc outer_user1;
-- View table formatting details
desc formatted inner_user1;
desc formatted outer_user1;
-- 2. Table modification operation
-- Modify table name
-- Note: External tables will only modify the metadata table name, and the table directory name in HDFS will not change.
alter table inner_user1 rename to inner_user;
alter table outer_user1 rename to out_user;
-- Modify fields in the table
-- Add fields
alter table inner_user add columns (age int);
alter table out_user add columns (age int);
-- Replace fields
alter table inner_user replace columns (id int,name string);
alter table out_user replace columns (id int,name string);
-- Modify fields
--string type cannot be modified to integer type
alter table inner_user change name u_name varchar(100);
alter table out_user change name u_name varchar(100);
alter table out_user change name u_name int; -- error report

-- Modify the table path (actually not recommended)
-- Note: After modifying the path, if the path does not exist, it will not be created immediately. The directory will be automatically generated when data is inserted in the future.
alter table inner_user set location '/inner_user';
alter table out_user set location '/out_user';
-- Modify table properties
-- Check the type first
desc formatted inner_user; -- MANAGED_TABLE
desc formatted out_user; -- EXTERNAL_TABLE
-- Change internal table to external table
alter table inner_user set tblproperties ('EXTERNAL'='TRUE');
--Change external table to internal table
alter table out_user set tblproperties ('EXTERNAL'='FLSE');
--Finally check the type
desc formatted inner_user; -- EXTERNAL_TABLE
desc formatted out_user;-- MANAGED_TABLE

Default delimiter

When creating a table, if you do not specify a separator, the table will only recognize the default separator in the future, which is difficult to print on the keyboard. The display format is generally: \0001, SOH, ^A,□

Example

create table stu(
    id int,
    name string
);
--Insert data in insert mode, and the data will automatically be connected using the default delimiter.
-- 2. Insert data
insert into stu values(1,'zhangsan');
-- 3. Verify data
select * from stu limit 1;
-- Of course, you can also view it in hdfs. The default delimiter is \0001. It will also be displayed as SOH, ^A, mouth

in other tools.

Quick mapping table

Knowledge

Specify the delimiter when creating the table: create [external] table table name (field name field type) row format delimited fields terminated by symbol;

Load data: load data [local] inpath ‘structured data file’ into table table name;

Example

-- Create table
create table pdt(
    dt string,
    id string,
    key string,
    url string
                ) row format delimited fields terminated by '\t';
--Download Data
load data inpath '/source/search_log.txt' into table pdt;
--verify the data
select * from hive2.pdt limit 2; -- there is data

Data Import and Export

File data loading and import

Upload files directly

window page upload

Requirements: It is known that the emp1.txt file is in the windows/mac system. It is required to use hdfs to save this file and use hivesql to create tables and associated data.

-- 1. First create a table based on data on hive, and then upload the file to the hdfs table path in window/mac
create table emp1(
    id int,
    name string,
    sal double,
    dept string
)row format delimited
fields terminated by ',';

-- windows uses hdfs page to upload files
--Visit the page at node1:9870 and upload emp1.txt to the path /user/hive/warehouse/hive02.db/emp1
-- Query data
select * from emp1;
linux local put upload

Requirements: It is known that the emp2.txt file is in the Linux system. It is required to use HDFS to save this file and use hivesql to create tables and associated data.

-- 2. First create a table based on data on hive, and then upload the file to the hdfs table path on linux
create table emp2(
    id int,
    name string,
    sal double,
    dept string
)row format delimited
fields terminated by ',';

-- Linux uses hdfs command to upload files
-- [root@node1 ~]# hdfs dfs -put emp2.txt /user/hive/warehouse/hive02.db/emp2

-- View data
select * from emp2;

load load file

Move the file from the hdfs path to the storage path corresponding to the table: load data inpath ‘HDFS file path’ [overwrite] into table table name;

Upload files from Linux locally to the storage path corresponding to the table: load data local inpath ‘Linux file path’ [overwrite] into table table name;

load move HDFS file

-- Requirement 1: load loads files in hdfs into the table path
-- 1. Create a table based on search_log.txt data in HDFS data
create table search_log(
    dt string,
    id string,
    key string,
    url string
) row format delimited fields terminated by '/t';
-- 2. Upload the search_log.txt file in windows to other paths in hdfs, for example: /source
-- 3. Use load to move the files in /src of hdfs to the storage path of the hdfs table corresponding to search_log.
load data inpath '/source/search_log.txt' overwrite into table search_log;
-- 4. Query data
select * from search_log;

load upload Linux file

-- Requirement 2: Directly upload the latest search_log.txt file in Linux to the hdfs path corresponding to the search table
-- First transfer the search_log.txt file in the information to linux, for example:/output
--load command to upload files
load data local inpath '/output/000000_0' overwrite into table search_log;
-- Query data
select * from search_log;

insert insert data

Query data from other tables is ‘appended’ and inserted into the current table: insert into [table] table name select statement;

Query data from other tables and ‘overwrite’ it into the current table: insert overwrite table table name select statement;

-- Requirement 1: Create a search_log_copy table, and then query data from search_log and insert it into the new table
create table search_log_copy(
    dt string,
    uid string,
    word string,
    url string
)row format delimited
fields terminated by '\t';

-- Check all data from the search_log table and insert it directly into the search_log_copy table
insert into table search_log_copy select * from search_log;
-- View data
select * from search_log_copy;

insert overwrite data

-- Requirement 2: Assuming that the data in the search_log table has changed, it is required to update the latest data to the search_log_copy table
insert overwrite table search_log_copy select * from search_log;
-- View data
select * from search_log_copy;

File data export

Download files directly

web page download

Requirements: It is known that the search_log.txt file is in the /user/hive/warehouse/hive02.db/search_log path of HFDS and needs to be downloaded to the window system.

get command to download file

Requirements: It is known that the search_log.txt file is in the /user/hive/warehouse/hive02.db/search_log path of HFDS and needs to be downloaded to the Linux system.

[root@node1 binzi]# hdfs dfs -get /user/hive/warehouse/hive02.db/search_log/search_log.txt /binzi

insert export data

Query data exported to other HDFS paths: insert overwrite directory ‘hdfs path to store the data’ select statement

Export query data to Linux local: insert overwrite local directory ‘linux stores the data path’ select statement;

Note: 1.overwrite defaults to overwriting, so try to specify an empty directory when specifying the data path.

2. When exporting data, if you do not specify a delimiter, use the default delimiter SOH,0001,?…

Add the specified delimiter for exported data: row format delimited fields terminated by ‘delimiter’

insert export to hdfs

-- Demonstrate insert overwrite to export data to a file
-- Syntax: insert overwrite [local] directory file storage path [specify delimiter] select statement;
-- Export data to hfds
-- Note: If it is the root directory /, the -ext-10000 directory will be automatically created to store the generated 000000_0 file
-- But other directories will automatically clear all contents and generate a 000000_0 file, so please note that the export directory should be a new empty directory as much as possible.
--Default delimiter
insert overwrite directory '/source' select * from search_log1;
-- Specify delimiter
insert overwrite directory '/output'
    row format delimited fields terminated by ','
select * from search_log1;

insert export linux

--2.2 export to linux
-- [root@node1 ~]# mkdir /output
-- Export to the /output directory of Linux and automatically generate the 000000_0 file to store the query results.
--Default delimiter
insert overwrite local directory '/output' select * from search_log1;
-- Specify delimiter
insert overwrite local directory '/output'
    row format delimited fields terminated by ','
select * from search_log1;

hive_shell command

hive command executes sql statement: hive -e “sql statement” > file path to store the result data

hive command executes sql script: hive -f sql script file > file path to store the result data

hsql statement export

# The following commands are executed from the Linux shell command line
# 3.1 Use hive -e sql statement to export data
[root@node1 ~]# hive -e 'select * from hive02.search_log;' > /home/hs1.txt
[root@node1 ~]# cat hs1.txt

hsql script export

# 3.2 Use hive -f script file to export data
[root@node1 ~]# echo 'select * from hive02.search_log;' > /home/export.sql
[root@node1 ~]# hive -f export.sql > /home/hs2.txt
[root@node1 ~]# cat hs2.txt