How to Create a Table in Hive

Introduction

Apache Hive is a data warehousing tool for analyzing structured data in Apache Hadoop. It uses an SQL-like language called HiveQL to simplify querying data.

Creating a table in Hive is a foundational step when structuring data. Mastering table creation in Hive is essential when working with databases in the Hadoop environment.

This guide shows how to create a table in Hive, load external data, and display data.

How to create a table in Hive.

Prerequisites

Step 1: Start Hadoop and Hive

Before starting, ensure that Hive is running on the system. To launch Hive locally:

1. Initialize the HDFS services:

start-dfs.sh
start-dfs.sh script terminal output

The script starts the NameNode, DataNode, and Secondary NameNode components.

2. Start the YARN ReosurceManager and NodeManager:

start-yarn.sh
start-yarn.sh terminal output

The services are essential for allocating resources and job scheduling.

Note: The services and nodes in the previous two steps are essential components in the Hadoop architecture. Learn more about Apache Hadoop architecture.

3. Start the Hive CLI with:

hive
Hive CLI terminal output

The command switches to the Hive shell.

Step 2: Create a Database

Start by creating a database. Open the Hive CLI and follow the steps below:

1. Use the CREATE DATABASE statement to create a new database:

CREATE DATABASE [database_name];
hive create database mydatabase output

The output confirms the creation and shows the time it took.

2. Verify the database is present:

SHOW DATABASES;
hive show databases output

The command lists all available databases, including the newly created one.

3. Switch to the new database:

USE [database_name];
hive use mydatabase output

The command sets the currently active database.

Step 3: Create a Table in Hive

A database does not contain any tables after creation. To create a table, use the following reference:

CREATE TABLE [table_name] (
    [column_1] [type],
    [column_2] [type],
    ...
    [column_n] [type]
);
create table hive output

Adjust the table properties to create a desired schema. Replace the table and column names and provide data types for each column. If importing data from an external file, ensure the data matches the external file's properties and add appropriate delimiters.

The table is further customizable through table format, partitioning, clustering, and other properties as required.

Step 4: Add Data

There are two ways to add data to a table in Hive:

  • Insert data directly into the table using a query.
  • Load data from an external file.

The sections below outline both methods.

Option 1: Insert Data

Use the INSERT INTO statement to add data to a table. The syntax is:

INSERT INTO [table_name] VALUES ([value_1], [value_2],...,[value_n]);
hive INSERT INTO statement job output

Replace the table name and values with the appropriate data.

Option 2: Load Data From a File

Another way to add data to a table is from an external file. Run the LOAD DATA INPATH command:

LOAD DATA INPATH '[file_path]' OVERWRITE INTO TABLE [table_name];
hive LOAD DATA INPATH query output

Replace the file path and table name. Ensure the file is accessible in the Hive environment and the file data matches the table schema.

Step 4: List Hive Tables and Data

To show all tables in a selected database, use the following statement:

SHOW TABLES;
hive SHOW TABLES query output

The command lists all table names that are in the currently selected database.

To show table column names and data types, run:

DESC [table_name];
hive DESC query output

The command shows column names and types.

To display table data, use a SELECT statement. For example, to select everything in a table, run:

SELECT * FROM [table_name];
SELECT statement hive query output

Alternatively, filter by column names:

SELECT [column_1, column_2] FROM [table_name];
SELECT statement columns hive query output

Conclusion

This guide showed how to create a table in Hive and load data. There are also various ways to query and view the data from Hive.

An alternate method is to create an external table in Hive. External tables are not managed with Hive, which enables data imports from an external file into the metastore.

Working in Hive and Hadoop is beneficial for manipulating big data. Next, learn more about Hadoop architecture.

Was this article helpful?
YesNo
Milica Dancuk
Milica Dancuk is a technical writer at phoenixNAP with a passion for programming. With a background in Electrical Engineering and Computing, coupled with her teaching experience, she excels at simplifying complex technical concepts in her writing.
Next you should read
What is HDFS? Hadoop Distributed File System Guide
August 27, 2020

Want to learn more about HDFS? This guide explains what the Hadoop Distributed File System is, how it works...
Read more
Hadoop vs Spark - Detailed Comparison
June 4, 2020

This article explains how Hadoop and Spark are different in multiple categories. The tools are both free, but...
Read more
What is Hadoop Mapreduce and How Does it Work
June 2, 2020

MapReduce is a powerful framework that handles big blocks of data to produce a summarized output...
Read more
Apache Hadoop Architecture Explained
May 25, 2020

Apache Hadoop is the go-to framework for storing and processing big data. This article provides clear-cut...
Read more