”;
AWK – Overview
AWK is an interpreted programming language. It is very powerful and specially designed for text processing. Its name is derived from the family names of its authors − Alfred Aho, Peter Weinberger, and Brian Kernighan.
The version of AWK that GNU/Linux distributes is written and maintained by the Free Software Foundation (FSF); it is often referred to as GNU AWK.
Types of AWK
Following are the variants of AWK −
-
AWK − Original AWK from AT & T Laboratory.
-
NAWK − Newer and improved version of AWK from AT & T Laboratory.
-
GAWK − It is GNU AWK. All GNU/Linux distributions ship GAWK. It is fully compatible with AWK and NAWK.
Typical Uses of AWK
Myriad of tasks can be done with AWK. Listed below are just a few of them −
- Text processing,
- Producing formatted text reports,
- Performing arithmetic operations,
- Performing string operations, and many more.
AWK – Environment
This chapter describes how to set up the AWK environment on your GNU/Linux system.
Installation Using Package Manager
Generally, AWK is available by default on most GNU/Linux distributions. You can use which command to check whether it is present on your system or not. In case you don’t have AWK, then install it on Debian based GNU/Linux using Advance Package Tool (APT) package manager as follows −
[jeryy]$ sudo apt-get update [jeryy]$ sudo apt-get install gawk
Similarly, to install AWK on RPM based GNU/Linux, use Yellowdog Updator Modifier yum package manager as follows −
[root]# yum install gawk
After installation, ensure that AWK is accessible via command line.
[jerry]$ which awk
On executing the above code, you get the following result −
/usr/bin/awk
Installation from Source Code
As GNU AWK is a part of the GNU project, its source code is available for free download. We have already seen how to install AWK using package manager. Let us now understand how to install AWK from its source code.
The following installation is applicable to any GNU/Linux software, and for most other freely-available programs as well. Here are the installation steps −
Step 1 − Download the source code from an authentic place. The command-line utility wget serves this purpose.
[jerry]$ wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.1.tar.xz
Step 2 − Decompress and extract the downloaded source code.
[jerry]$ tar xvf gawk-4.1.1.tar.xz
Step 3 − Change into the directory and run configure.
[jerry]$ ./configure
Step 4 − Upon successful completion, the configure generates Makefile. To compile the source code, issue a make command.
[jerry]$ make
Step 5 − You can run the test suite to ensure the build is clean. This is an optional step.
[jerry]$ make check
Step 6 − Finally, install AWK. Make sure you have super-user privileges.
[jerry]$ sudo make install
That is it! You have successfully compiled and installed AWK. Verify it by executing the awk command as follows −
[jerry]$ which awk
On executing this code, you get the following result −
/usr/bin/awk
AWK – Workflow
To become an expert AWK programmer, you need to know its internals. AWK follows a simple workflow − Read, Execute, and Repeat. The following diagram depicts the workflow of AWK −
Read
AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.
Execute
All AWK commands are applied sequentially on the input. By default AWK execute commands on every line. We can restrict this by providing patterns.
Repeat
This process repeats until the file reaches its end.
Program Structure
Let us now understand the program structure of AWK.
BEGIN block
The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is good place to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. Please note that this block is optional.
Body Block
The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes commands on every line. We can restrict this by providing patterns. Note that there are no keywords for the Body block.
END Block
The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and hence it must be in upper-case. Please note that this block is optional.
Let us create a file marks.txt which contains the serial number, name of the student, subject name, and number of marks obtained.
1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
Let us now display the file contents with header by using AWK script.
Example
[jerry]$ awk ''BEGIN{printf "Sr NotNametSubtMarksn"} {print}'' marks.txt
When this code is executed, it produces the following result −
Output
Sr No Name Sub Marks 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
At the start, AWK prints the header from the BEGIN block. Then in the body block, it reads a line from a file and executes AWK”s print command which just prints the contents on the standard output stream. This process repeats until file reaches the end.
AWK – Basic Syntax
AWK is simple to use. We can provide AWK commands either directly from the command line or in the form of a text file containing AWK commands.
AWK Command Line
We can specify an AWK command within single quotes at command line as shown −
awk [options] file ...
Example
Consider a text file marks.txt with the following content −
1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
Let us display the complete content of the file using AWK as follows −
Example
[jerry]$ awk ''{print}'' marks.txt
On executing this code, you get the following result −
Output
1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
AWK Program File
We can provide AWK commands in a script file as shown −
awk [options] -f file ....
First, create a text file command.awk containing the AWK command as shown below −
{print}
Now we can instruct the AWK to read commands from the text file and perform the action. Here, we achieve the same result as shown in the above example.
Example
[jerry]$ awk -f command.awk marks.txt
On executing this code, you get the following result −
Output
1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
AWK Standard Options
AWK supports the following standard options which can be provided from the command line.
The -v option
This option assigns a value to a variable. It allows assignment before the program execution. The following example describes the usage of the -v option.
Example
[jerry]$ awk -v name=Jerry ''BEGIN{printf "Name = %sn", name}''
On executing this code, you get the following result −
Output
Name = Jerry
The –dump-variables[=file] option
It prints a sorted list of global variables and their final values to file. The default file is awkvars.out.
Example
[jerry]$ awk --dump-variables '''' [jerry]$ cat awkvars.out
On executing the above code, you get the following result −
Output
ARGC: 1 ARGIND: 0 ARGV: array, 1 elements BINMODE: 0 CONVFMT: "%.6g" ERRNO: "" FIELDWIDTHS: "" FILENAME: "" FNR: 0 FPAT: "[^[:space:]]+" FS: " " IGNORECASE: 0 LINT: 0 NF: 0 NR: 0 OFMT: "%.6g" OFS: " " ORS: "n" RLENGTH: 0 RS: "n" RSTART: 0 RT: "" SUBSEP: " 34" TEXTDOMAIN: "messages"
The –help option
This option prints the help message on standard output.
Example
[jerry]$ awk --help
On executing this code, you get the following result −
Output
Usage: awk [POSIX or GNU style options] -f progfile [--] file ... Usage: awk [POSIX or GNU style options] [--] ''program'' file ... POSIX options : GNU long options: (standard) -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val Short options : GNU long options: (extensions) -b --characters-as-bytes -c --traditional -C --copyright -d[file] --dump-variables[=file] -e ''program-text'' --source=''program-text'' -E file --exec=file -g --gen-pot -h --help -L [fatal] --lint[=fatal] -n --non-decimal-data -N --use-lc-numeric -O --optimize -p[file] --profile[=file] -P --posix -r --re-interval -S --sandbox -t --lint-old -V --version
The –lint[=fatal] option
This option enables checking of non-portable or dubious constructs. When an argument fatal is provided, it treats warning messages as errors. The following example demonstrates this −
Example
[jerry]$ awk --lint '''' /bin/ls
On executing this code, you get the following result −
Output
awk: cmd. line:1: warning: empty program text on command line awk: cmd. line:1: warning: source file does not end in newline awk: warning: no program text at all!
The –posix option
This option turns on strict POSIX compatibility, in which all common and gawk-specific extensions are disabled.
The –profile[=file] option
This option generates a pretty-printed version of the program in file. Default file is awkprof.out. Below simple example illustrates this −
Example
[jerry]$ awk --profile ''BEGIN{printf"---|Header|--n"} {print} END{printf"---|Footer|---n"}'' marks.txt > /dev/null [jerry]$ cat awkprof.out
On executing this code, you get the following result −
Output
# gawk profile, created Sun Oct 26 19:50:48 2014 # BEGIN block(s) BEGIN { printf "---|Header|--n" } # Rule(s) { print $0 } # END block(s) END { printf "---|Footer|---n" }
The –traditional option
This option disables all gawk-specific extensions.
The –version option
This option displays the version information of the AWK program.
Example
[jerry]$ awk --version
When this code is executed, it produces the following result −
Output
GNU Awk 4.0.1 Copyright (C) 1989, 1991-2012 Free Software Foundation.
AWK – Basic Examples
This chapter describes several useful AWK commands and their appropriate examples. Consider a text file marks.txt to be processed with the following content −
1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
Printing Column or Field
You can instruct AWK to print only certain columns from the input field. The following example demonstrates this −
Example
[jerry]$ awk ''{print $3 "t" $4}'' marks.txt
On executing this code, you get the following result −
Output
Physics 80 Maths 90 Biology 87 English 85 History 89
In the file marks.txt, the third column contains the subject name and the fourth column contains the marks obtained in a particular subject. Let us print these two columns using AWK print command. In the above example, $3 and $4 represent the third and the fourth fields respectively from the input record.
Printing All Lines
By default, AWK prints all the lines that match pattern.
Example
[jerry]$ awk ''/a/ {print $0}'' marks.txt
On executing this code, you get the following result −
Output
2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
In the above example, we are searching form pattern a. When a pattern match succeeds, it executes a command from the body block. In the absence of a body block − default action is taken which is print the record. Hence, the following command produces the same result −
Example
[jerry]$ awk ''/a/'' marks.txt
Printing Columns by Pattern
When a pattern match succeeds, AWK prints the entire record by default. But you can instruct AWK to print only certain fields. For instance, the following example prints the third and fourth field when a pattern match succeeds.
Example
[jerry]$ awk ''/a/ {print $3 "t" $4}'' marks.txt
On executing this code, you get the following result −
Output
Maths 90 Biology 87 English 85 History 89
Printing Column in Any Order
You can print columns in any order. For instance, the following example prints the fourth column followed by the third column.
Example
[jerry]$ awk ''/a/ {print $4 "t" $3}'' marks.txt
On executing the above code, you get the following result −
Output
90 Maths 87 Biology 85 English 89 History
Counting and Printing Matched Pattern
Let us see an example where you can count and print the number of lines for which a pattern match succeeded.
Example
[jerry]$ awk ''/a/{++cnt} END {print "Count = ", cnt}'' marks.txt
On executing this code, you get the following result −
Output
Count = 4
In this example, we increment the value of counter when a pattern match succeeds and we print this value in the END block. Note that unlike other programming languages, there is no need to declare a variable before using it.
Printing Lines with More than 18 Characters
Let us print only those lines that contain more than 18 characters.
Example
[jerry]$ awk ''length($0) > 18'' marks.txt
On executing this code, you get the following result −
Output
3) Shyam Biology 87 4) Kedar English 85
AWK provides a built-in length function that returns the length of the string. $0 variable stores the entire line and in the absence of a body block, default action is taken, i.e., the print action. Hence, if a line has more than 18 characters, then the comparison results true and the line gets printed.
AWK – Built-in Variables
AWK provides several built-in variables. They play an important role while writing AWK scripts. This chapter demonstrates the usage of built-in variables.
Standard AWK variables
The standard AWK variables are discussed below.
ARGC
It implies the number of arguments provided at the command line.
Example
[jerry]$ awk ''BEGIN {print "Arguments =", ARGC}'' One Two Three Four
On executing this code, you get the following result −
Output
Arguments = 5
But why AWK shows 5 when you passed only 4 arguments? Just check the following example to clear your doubt.
ARGV
It is an array that stores the command-line arguments. The array”s valid index ranges from 0 to ARGC-1.
Example
[jerry]$ awk ''BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %sn", i, ARGV[i] } }'' one two three four
On executing this code, you get the following result −
Output
ARGV[0] = awk ARGV[1] = one ARGV[2] = two ARGV[3] = three
CONVFMT
It represents the conversion format for numbers. Its default value is %.6g.
Example
[jerry]$ awk ''BEGIN { print "Conversion Format =", CONVFMT }''
On executing this code, you get the following result −
Output
Conversion Format = %.6g
ENVIRON
It is an associative array of environment variables.
Example
[jerry]$ awk ''BEGIN { print ENVIRON["USER"] }''
On executing this code, you get the following result −
Output
jerry
To find names of other environment variables, use env command.
FILENAME
It represents the current file name.
Example
[jerry]$ awk ''END {print FILENAME}'' marks.txt
On executing this code, you get the following result −
Output
marks.txt
Please note that FILENAME is undefined in the BEGIN block.
FS
It represents the (input) field separator and its default value is space. You can also change this by using -F command line option.
Example
[jerry]$ awk ''BEGIN {print "FS = " FS}'' | cat -vte
On executing this code, you get the following result −
Output
FS = $
NF
It represents the number of fields in the current record. For instance, the following example prints only those lines that contain more than two fields.
Example
[jerry]$ echo -e "One TwonOne Two ThreenOne Two Three Four" | awk ''NF > 2''
On executing this code, you get the following result −
Output
One Two Three One Two Three Four
NR
It represents the number of the current record. For instance, the following example prints the record if the current record number is less than three.
Example
[jerry]$ echo -e "One TwonOne Two ThreenOne Two Three Four" | awk ''NR < 3''
On executing this code, you get the following result −
Output
One Two One Two Three
FNR
It is similar to NR, but relative to the current file. It is useful when AWK is operating on multiple files. Value of FNR resets with new file.
OFMT
It represents the output format number and its default value is %.6g.
Example
[jerry]$ awk ''BEGIN {print "OFMT = " OFMT}''
On executing this code, you get the following result −
Output
OFMT = %.6g
OFS
It represents the output field separator and its default value is space.
Example
[jerry]$ awk ''BEGIN {print "OFS = " OFS}'' | cat -vte
On executing this code, you get the following result −
Output
OFS = $
ORS
It represents the output record separator and its default value is newline.
Example
[jerry]$ awk ''BEGIN {print "ORS = " ORS}'' | cat -vte
On executing the above code, you get the following result −
Output
ORS = $ $
RLENGTH
It represents the length of the string matched by match function. AWK”s match function searches for a given string in the input-string.
Example
[jerry]$ awk ''BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }''
On executing this code, you get the following result −
Output
2
RS
It represents (input) record separator and its default value is newline.
Example
[jerry]$ awk ''BEGIN {print "RS = " RS}'' | cat -vte
On executing this code, you get the following result −
Output
RS = $ $
RSTART
It represents the first position in the string matched by match function.
Example
[jerry]$ awk ''BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }''
On executing this code, you get the following result −
Output
9
SUBSEP
It represents the separator character for array subscripts and its default value is