How to Create a Table in Hive

October 28, 2020

Introduction

Apache Hive is a data warehousing tool used to perform queries and analyze structured data in Apache Hadoop. It uses a SQL-like language called HiveQL.

In this article, learn how to create a table in Hive and load data. We will also show you crucial HiveQL commands to display data.

How to create a table in Hive.

Prerequisites

  • A system running Linux
  • A user account with sudo or root privileges
  • Access to a terminal window/command line
  • Working Hadoop installation
  • Working Hive installation

Note: Follow our instructions for installing Hadoop and installing Hive on Ubuntu if you have not installed it before.

Create and Load Table in Hive

A table in Hive is a set of data that uses a schema to sort the data by given identifiers.

The general syntax for creating a table in Hive is:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type [COMMENT 'col_comment'],, ...)
[COMMENT 'table_comment']
[ROW FORMAT row_format]
[FIELDS TERMINATED BY char]
[STORED AS file_format];

Follow the steps below to create a table in Hive.

Step 1: Create a Database

1. Create a database named “company” by running the create command:

create database company;

The terminal prints a confirmation message and the time needed to perform the action.

2. Next, verify the database is created by running the show command:

show databases;

3. Find the “company” database in the list:

Creating a database in Hive.

4. Open the “company” database by using the following command:

use company;
Selecting database in Hive.

Step 2: Create a Table in Hive

The “company” database does not contain any tables after initial creation. Let’s create a table whose identifiers will match the .txt file you want to transfer data from.

1. Create an “employees.txt” file in the /hdoop directory. The file shall contain data about employees:

Creating the text file used to load the data in the table/


2. Arrange the data from the “employees.txt” file in columns. The column names in our example are:

  • ID
  • Name
  • Country
  • Department
  • Salary

3. Use column names when creating a table. Create the table by running the following command:

create table employees (id int, name string, country string, department string, salary int)

4. Create a logical schema that arranges data from the .txt file to the corresponding columns. In the “employees.txt” file, data is separated by a '-'. To create a logical schema type:

row format delimited fields terminated by '-';

The terminal prints out a confirmation message:

Creating table in Hive.

5. Verify if the table is created by running the show command:

show tables;
Verifying the table creation.

Step 3: Load Data From a File

You have created a table, but it is empty because data is not loaded from the “employees.txt” file located in the /hdoop directory.

1. Load data by running the load command:

load data inpath '/hdoop/employees.txt' overwrite into table employees;

2. Verify if the data is loaded by running the select command:

select * from employees;

The terminal prints out data imported from the employees.txt file:

Loading data into the table.

Display Hive Data

You have several options for displaying data from the table. By using the following options, you can manipulate large amounts of data more efficiently.

Display Columns

Display columns of a table by running the desc command:

desc employees;

The output displays the names and properties of the columns:

Displaying column information

Display Selected Data

Let’s assume that you want to display employees and their countries of origin. Select and display data by running the select command:

select name,country from employees;

The output contains the list of employees and their countries:

Selecting specific data for displaying.

Conclusion

After reading this article, you should have learned how to create a table in Hive and load data into it.

There is also a method of creating an external table in Hive. In Hive terminology, external tables are tables not managed with Hive. Their purpose is to facilitate importing of data from an external file into the metastore.

Working in Hive and Hadoop is beneficial for manipulating big data. Next, we recommend our article about Hadoop architecture to learn more about how Hadoop functions.

Was this article helpful?
YesNo
Bosko Marijan
Having worked as an educator and content writer, combined with his lifelong passion for all things high-tech, Bosko strives to simplify intricate concepts and make them user-friendly. That has led him to technical writing at PhoenixNAP, where he continues his mission of spreading knowledge.
Next you should read
What is HDFS? Hadoop Distributed File System Guide
August 27, 2020

Want to learn more about HDFS? This guide explains what the Hadoop Distributed File System is, how it works...
Read more
Hadoop vs Spark - Detailed Comparison
June 4, 2020

This article explains how Hadoop and Spark are different in multiple categories. The tools are both free, but...
Read more
What is Hadoop Mapreduce and How Does it Work
June 2, 2020

MapReduce is a powerful framework that handles big blocks of data to produce a summarized output...
Read more
Apache Hadoop Architecture Explained
May 25, 2020

Apache Hadoop is the go-to framework for storing and processing big data. This article provides clear-cut...
Read more