gawk Linux Command With Examples

September 15, 2022

Introduction

The gawk command is the GNU version of awk. Gawk is a powerful text-processing and data-manipulating tool with many features and practical uses.

This guide will teach you how to use the Linux gawk command with examples.

gawk Linux command with examples

Prerequisites

  • A system running Linux.
  • Access to the terminal.
  • A text file. This tutorial uses the file people as an example.

gawk Linux Command Syntax

The basic gawk syntax looks like this:

gawk [options] [actions/filters] input_file

The command cannot be run without any arguments. The options are not mandatory, but for gawk to produce output, at least one action should be assigned. Actions and filters are different subcommands and selection criteria that enable gawk to manipulate data from the input file.

Note: Encase options and actions in single quotes.

gawk Options

The gawk command is a versatile tool thanks to its numerous arguments. With gawk being the GNU implementation of awk, long, GNU-style options are available. Each long option has a corresponding short one.

Common options are presented below:

OptionDescription
-f program-file, --file program-fileReads commands from a file, which serves as a script, instead of the first argument in the terminal.
-F fs, --field-separator fsUses the predefined variable fs as the input field separator.
-v var=val, --assign var=valAssigns a value to the variable before executing a script.
-b, --characters-as-bytesTreats all data as single-byte characters.
-c--traditionalExecutes gawk in compatibility mode.
-C--copyrightDisplays the GNU Copyright message.
-d[file], --dump-variables[=file]Shows a list of variables, their types, and values.
-e program-text, --source program-textAllows the mixing of library functions and source code.
-E file--exec fileTurns off terminal variable assignments.
-L [value], --lint[=value]Prints warning messages about code not portable to other AWK implementations.
-S--sandboxRuns gawk in sandbox mode.

gawk Built-in Variables

The gawk command offers several built-in variables used to store and add value to the command. Variables are manipulated from the terminal and only affect the program when a user assigns value to them. Some important gawk built-in variables are:

VariableDescription
ARGCShows the number of terminal arguments.
ARGINDDisplays the ARGV file index.
ARGVPresents an array of terminal arguments.
ERRNOContains strings describing a system error.
FIELDWIDTHSDisplays white-space separated list of field widths.
FILENAMEPrints the input file name.
FNRShows input record number.
FSRepresents the input field separator.
IGNORECASETurns case-sensitive search on or off.
NFPrints the input file field count.
NRPrints the current file line count.
OFSDisplays the output field separator.
ORSShows the output record separator.
RSPrints the input record separator.
RSTARTRepresents the index of the first matched character.
RLENGTHRepresents the matched string length.

gawk Examples

The use of gawk pattern-matching and language-processing functions are extensive. This article aims to provide practical examples through which users learn to use the gawk utility.

Important: The gawk command is case-sensitive. Use the IGNORECASE variable to ignore case.

Print Files

By default, gawk with a print argument displays every line from the specified file. For instance, running the cat command on the people text file prints the following:

cat command terminal output

The gawk command displays the same result:

gawk '{print}' people
gawk print terminal output

Print a Column

In text files, spaces are usually used as delimiters for columns. The people file consists of four columns:

  1. Ordinal numbers.
  2. First names.
  3. Last names.
  4. Year of birth.

Use gawk to show only a specific column in the terminal. For instance:

gawk '{print $2}' people
gawk $2 terminal output

The command prints only the second column. To print multiple columns, like column one (ordinal numbers) and column two (first names), run:

gawk '{print $1, $2}' people
gawk $1 comma $2 terminal output

The gawk command also works without the comma between $1 and $2. However, there are no spaces between columns in the output:

gawk '{print $1 $2}' people
gawk $1 $2 terminal output

Filter Columns

The gawk command offers additional filtering options. For instance, print lines containing the capital letter O with:

gawk '/O/ {print}' people
gawk O print terminal output

To show only lines containing letters O or A, use piping:

gawk '/O|A/ {print}' people
gawk O pipe A print terminal output

The command prints any line that includes a word with capital O or A. On the other hand, use logical AND (&&) to show lines including both O and the year 1995:

gawk '/O/ && /1995/' people
gawk logical AND terminal output

The filters work with numbers as well. For example, show only people born in the 1990s with:

gawk '/199*/ {print}' people
gawk 199 print terminal output

The output shows only lines in which the fourth column includes the value 199.

Customize the output even more by combining previously mentioned options. For example, print only the first and last names of people born in 1995 or 2003 with:

gawk '/1995|2003/ {print $2, $3}' people
gawk 1995 pipe 2003 two columns print terminal output

The command prints columns two and three as stated in the {print $2, $3} part. The output only shows lines containing the numbers 1995 and 2003, even though columns containing those numbers are hidden.

The gawk command also lets users print everything except for the lines containing the specified string with the logical NOT(!). For instance, omit lines containing the string 19 in the output:

gawk '!/19/' people
gawk ! terminal output

Add Line Numbers

The people file includes line numbers in the first column. In case users are working on a file without line numbers, gawk presents options to add them.

For instance, the humans file doesn't include any ordinal numbers:

gawk print no line numbers terminal output

To add line numbers, execute gawk with FNR and next:

gawk '{ print FNR, $0; next}' humans
gawk add line numbers terminal output

The command adds a line number before each line. The same result is achieved with the NR variable: 

gawk '{print NR, $0}' mobile.txt
gawk print line numbers NR variable terminal output

Find Line Count 

To count the total number of lines in the file, use the END statement and the NR variable with gawk:

gawk 'END { print NR }' people
gawk line count terminal output

The command reads each line. Once gawk reaches END, it prints the value of NR - which contains the total number of lines. Running the same command without the END statement prints only the value of NR - the number of lines:

gawk NR terminal output

Filter Lines Based on Length

Use the following command option to print only lines longer than 20 characters:

gawk 'length>20' people
gawk length longer lines terminal output

It also works with multiple arguments. For instance, show lines longer than 17 but shorter than 20 characters:

gawk 'length<20 && length>17' people
gawk length terminal output

To display lines that are exactly 20 characters long, run:

gawk 'length==20' people
gawk exact length terminal output

Print Info Based on Conditions

The gawk command allows for the use of the if-else statements. For instance, another way to filter only people born after 1999 is with a simple if statement:

gawk '{ if ($4>1999) print }' people
gawk if statement terminal output

The if statement sets the condition that entries in column four have to be larger than 1999. The output shows only entries that satisfy the condition. Expand the command into an if-else statement to print lines not satisfying the original condition.

gawk '{if ($4>1999) print $0," ==>00s"; else print $0, "==>90s"}' people
gawk if else statement terminal output

The command includes:

  • If statement. If the condition is satisfied, gawk adds a string "==>90s" to the output line.
  • Else statement. In case the line doesn't satisfy the condition, gawk still prints that line in the output, adding the "==>00s" string to the output.

Add a Header

In the same way in which the END statement allows users to modify the output at the end of the file, the BEGIN statement formats the data at the beginning.

When used with awk, the BEGIN sections are always executed first. After that, awk executes the remaining lines. One way to use the BEGIN statement is to add a header to the output.

Execute the following command to add a section above the awk output:

gawk 'BEGIN {print "No/First&Last Name/Year of Birth"} {print $0}' people
gawk BEGIN statement terminal output

Find the Longest Line Length

Combine previous arguments with the if and END statements to find the longest line in the people file:

gawk '{ if (length($0) > max) max = length($0) } END { print max }' people
gawk the longest line terminal output

Find the Number of Fields

The gawk command also allows users to display the number of fields with the NF variable. The simplest way to display the number of fields prints a difficult-to-read output:

gawk '{print NF}' people
gawk print NF terminal output

The command outputs the number of fields per line without any additional info. To customize the output and make it more human-readable, adjust the initial command:

gawk '{print NR, "-->", NF}' people
gawk print NR NF terminal output

The command now includes:

  • The NR variable that adds line numbers to each output line.
  • The --> string that separates line numbers from the field numbers.

Another way to show line and field numbers in the people file is to print columns with NF. Note that the people file includes ordinal numbers in column one. Therefore the NR variable is omitted:

gawk '{print $0, "-->", NF}' people
gawk NF terminal output

Finally, to print the total number of fields, execute:

gawk '{num_fields = num_fields + NF} END {print num_fields}' people
gawk total number of fields terminal output

The file does have ten lines and four columns. Hence, the output is correct.

Conclusion

After going through this tutorial, you know how to use the gawk for advanced text processing and data manipulation.

Also consider using grep, a powerful Linux tool for searching for strings, words, and patterns.

Was this article helpful?
YesNo
Sara Zivanov
Sara Zivanov is a technical writer at phoenixNAP who is passionate about making high-tech concepts accessible to everyone. Her experience as a content writer and her background in Engineering and Project Management allows her to streamline complex processes and make them user-friendly through her content.
Next you should read
Linux File Command: How to Determine File Type in Linux
March 3, 2022

The file command in Linux determines the type of a file. Learn how to use the file command in this tutorial...
Read more
Linux tr Command with Examples
May 10, 2022

The tr command is a utility used for translating, deleting, or squeezing characters from standard input. This guide shows...
Read more
Linux egrep Command With Examples
September 8, 2022

The egrep command searches for patterns or regular expressions in files and directories. Read this tutorial to master the egrep command...
Read more
AWK Command in Linux with Examples
October 28, 2021

This tutorial shows how to use the awk command in Linux to perform advanced text manipulation...
Read more
  • © 2022 Copyright phoenixNAP | Global IT Services. All Rights Reserved.