AWK – Overview

AWK – Overview ”; Previous Next AWK is an interpreted programming language. It is very powerful and specially designed for text processing. Its name is derived from the family names of its authors − Alfred Aho, Peter Weinberger, and Brian Kernighan. The version of AWK that GNU/Linux distributes is written and maintained by the Free Software Foundation (FSF); it is often referred to as GNU AWK. Types of AWK Following are the variants of AWK − AWK − Original AWK from AT & T Laboratory. NAWK − Newer and improved version of AWK from AT & T Laboratory. GAWK − It is GNU AWK. All GNU/Linux distributions ship GAWK. It is fully compatible with AWK and NAWK. Typical Uses of AWK Myriad of tasks can be done with AWK. Listed below are just a few of them − Text processing, Producing formatted text reports, Performing arithmetic operations, Performing string operations, and many more. Print Page Previous Next Advertisements ”;

AWK – Arrays

AWK – Arrays ”; Previous Next AWK has associative arrays and one of the best thing about it is – the indexes need not to be continuous set of number; you can use either string or number as an array index. Also, there is no need to declare the size of an array in advance – arrays can expand/shrink at runtime. Its syntax is as follows − Syntax array_name[index] = value Where array_name is the name of array, index is the array index, and value is any value assigning to the element of the array. Creating Array To gain more insight on array, let us create and access the elements of an array. Example [jerry]$ awk ”BEGIN { fruits[“mango”] = “yellow”; fruits[“orange”] = “orange” print fruits[“orange”] “n” fruits[“mango”] }” On executing this code, you get the following result − Output orange yellow In the above example, we declare the array as fruits whose index is fruit name and the value is the color of the fruit. To access array elements, we use array_name[index] format. Deleting Array Elements For insertion, we used assignment operator. Similarly, we can use delete statement to remove an element from the array. The syntax of delete statement is as follows − Syntax delete array_name[index] The following example deletes the element orange. Hence the command does not show any output. Example [jerry]$ awk ”BEGIN { fruits[“mango”] = “yellow”; fruits[“orange”] = “orange”; delete fruits[“orange”]; print fruits[“orange”] }” Multi-Dimensional arrays AWK only supports one-dimensional arrays. But you can easily simulate a multi-dimensional array using the one-dimensional array itself. For instance, given below is a 3×3 two-dimensional array − 100 200 300 400 500 600 700 800 900 In the above example, array[0][0] stores 100, array[0][1] stores 200, and so on. To store 100 at array location [0][0], we can use the following syntax − Syntax array[“0,0″] = 100 Though we gave 0,0 as index, these are not two indexes. In reality, it is just one index with the string 0,0. The following example simulates a 2-D array − Example [jerry]$ awk ”BEGIN { array[“0,0”] = 100; array[“0,1”] = 200; array[“0,2”] = 300; array[“1,0”] = 400; array[“1,1”] = 500; array[“1,2”] = 600; # print array elements print “array[0,0] = ” array[“0,0”]; print “array[0,1] = ” array[“0,1”]; print “array[0,2] = ” array[“0,2”]; print “array[1,0] = ” array[“1,0”]; print “array[1,1] = ” array[“1,1”]; print “array[1,2] = ” array[“1,2″]; }” On executing this code, you get the following result − Output array[0,0] = 100 array[0,1] = 200 array[0,2] = 300 array[1,0] = 400 array[1,1] = 500 array[1,2] = 600 You can also perform a variety of operations on an array such as sorting its elements/indexes. For that purpose, you can use assort and asorti functions Print Page Previous Next Advertisements ”;

AWK – Workflow

AWK – Workflow ”; Previous Next To become an expert AWK programmer, you need to know its internals. AWK follows a simple workflow − Read, Execute, and Repeat. The following diagram depicts the workflow of AWK − Read AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory. Execute All AWK commands are applied sequentially on the input. By default AWK execute commands on every line. We can restrict this by providing patterns. Repeat This process repeats until the file reaches its end. Program Structure Let us now understand the program structure of AWK. BEGIN block The syntax of the BEGIN block is as follows − Syntax BEGIN {awk-commands} The BEGIN block gets executed at program start-up. It executes only once. This is good place to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. Please note that this block is optional. Body Block The syntax of the body block is as follows − Syntax /pattern/ {awk-commands} The body block applies AWK commands on every input line. By default, AWK executes commands on every line. We can restrict this by providing patterns. Note that there are no keywords for the Body block. END Block The syntax of the END block is as follows − Syntax END {awk-commands} The END block executes at the end of the program. END is an AWK keyword and hence it must be in upper-case. Please note that this block is optional. Let us create a file marks.txt which contains the serial number, name of the student, subject name, and number of marks obtained. 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 Let us now display the file contents with header by using AWK script. Example [jerry]$ awk ”BEGIN{printf “Sr NotNametSubtMarksn”} {print}” marks.txt When this code is executed, it produces the following result − Output Sr No Name Sub Marks 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 At the start, AWK prints the header from the BEGIN block. Then in the body block, it reads a line from a file and executes AWK”s print command which just prints the contents on the standard output stream. This process repeats until file reaches the end. Print Page Previous Next Advertisements ”;

AWK – Basic Examples

AWK – Basic Examples ”; Previous Next This chapter describes several useful AWK commands and their appropriate examples. Consider a text file marks.txt to be processed with the following content − 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 Printing Column or Field You can instruct AWK to print only certain columns from the input field. The following example demonstrates this − Example [jerry]$ awk ”{print $3 “t” $4}” marks.txt On executing this code, you get the following result − Output Physics 80 Maths 90 Biology 87 English 85 History 89 In the file marks.txt, the third column contains the subject name and the fourth column contains the marks obtained in a particular subject. Let us print these two columns using AWK print command. In the above example, $3 and $4 represent the third and the fourth fields respectively from the input record. Printing All Lines By default, AWK prints all the lines that match pattern. Example [jerry]$ awk ”/a/ {print $0}” marks.txt On executing this code, you get the following result − Output 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 In the above example, we are searching form pattern a. When a pattern match succeeds, it executes a command from the body block. In the absence of a body block − default action is taken which is print the record. Hence, the following command produces the same result − Example [jerry]$ awk ”/a/” marks.txt Printing Columns by Pattern When a pattern match succeeds, AWK prints the entire record by default. But you can instruct AWK to print only certain fields. For instance, the following example prints the third and fourth field when a pattern match succeeds. Example [jerry]$ awk ”/a/ {print $3 “t” $4}” marks.txt On executing this code, you get the following result − Output Maths 90 Biology 87 English 85 History 89 Printing Column in Any Order You can print columns in any order. For instance, the following example prints the fourth column followed by the third column. Example [jerry]$ awk ”/a/ {print $4 “t” $3}” marks.txt On executing the above code, you get the following result − Output 90 Maths 87 Biology 85 English 89 History Counting and Printing Matched Pattern Let us see an example where you can count and print the number of lines for which a pattern match succeeded. Example [jerry]$ awk ”/a/{++cnt} END {print “Count = “, cnt}” marks.txt On executing this code, you get the following result − Output Count = 4 In this example, we increment the value of counter when a pattern match succeeds and we print this value in the END block. Note that unlike other programming languages, there is no need to declare a variable before using it. Printing Lines with More than 18 Characters Let us print only those lines that contain more than 18 characters. Example [jerry]$ awk ”length($0) > 18” marks.txt On executing this code, you get the following result − Output 3) Shyam Biology 87 4) Kedar English 85 AWK provides a built-in length function that returns the length of the string. $0 variable stores the entire line and in the absence of a body block, default action is taken, i.e., the print action. Hence, if a line has more than 18 characters, then the comparison results true and the line gets printed. Print Page Previous Next Advertisements ”;

AWK – Environment

AWK – Environment ”; Previous Next This chapter describes how to set up the AWK environment on your GNU/Linux system. Installation Using Package Manager Generally, AWK is available by default on most GNU/Linux distributions. You can use which command to check whether it is present on your system or not. In case you don’t have AWK, then install it on Debian based GNU/Linux using Advance Package Tool (APT) package manager as follows − [jeryy]$ sudo apt-get update [jeryy]$ sudo apt-get install gawk Similarly, to install AWK on RPM based GNU/Linux, use Yellowdog Updator Modifier yum package manager as follows − [root]# yum install gawk After installation, ensure that AWK is accessible via command line. [jerry]$ which awk On executing the above code, you get the following result − /usr/bin/awk Installation from Source Code As GNU AWK is a part of the GNU project, its source code is available for free download. We have already seen how to install AWK using package manager. Let us now understand how to install AWK from its source code. The following installation is applicable to any GNU/Linux software, and for most other freely-available programs as well. Here are the installation steps − Step 1 − Download the source code from an authentic place. The command-line utility wget serves this purpose. [jerry]$ wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.1.tar.xz Step 2 − Decompress and extract the downloaded source code. [jerry]$ tar xvf gawk-4.1.1.tar.xz Step 3 − Change into the directory and run configure. [jerry]$ ./configure Step 4 − Upon successful completion, the configure generates Makefile. To compile the source code, issue a make command. [jerry]$ make Step 5 − You can run the test suite to ensure the build is clean. This is an optional step. [jerry]$ make check Step 6 − Finally, install AWK. Make sure you have super-user privileges. [jerry]$ sudo make install That is it! You have successfully compiled and installed AWK. Verify it by executing the awk command as follows − [jerry]$ which awk On executing this code, you get the following result − /usr/bin/awk Print Page Previous Next Advertisements ”;

AWK – Basic Syntax

AWK – Basic Syntax ”; Previous Next AWK is simple to use. We can provide AWK commands either directly from the command line or in the form of a text file containing AWK commands. AWK Command Line We can specify an AWK command within single quotes at command line as shown − awk [options] file … Example Consider a text file marks.txt with the following content − 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 Let us display the complete content of the file using AWK as follows − Example [jerry]$ awk ”{print}” marks.txt On executing this code, you get the following result − Output 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 AWK Program File We can provide AWK commands in a script file as shown − awk [options] -f file …. First, create a text file command.awk containing the AWK command as shown below − {print} Now we can instruct the AWK to read commands from the text file and perform the action. Here, we achieve the same result as shown in the above example. Example [jerry]$ awk -f command.awk marks.txt On executing this code, you get the following result − Output 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 AWK Standard Options AWK supports the following standard options which can be provided from the command line. The -v option This option assigns a value to a variable. It allows assignment before the program execution. The following example describes the usage of the -v option. Example [jerry]$ awk -v name=Jerry ”BEGIN{printf “Name = %sn”, name}” On executing this code, you get the following result − Output Name = Jerry The –dump-variables[=file] option It prints a sorted list of global variables and their final values to file. The default file is awkvars.out. Example [jerry]$ awk –dump-variables ”” [jerry]$ cat awkvars.out On executing the above code, you get the following result − Output ARGC: 1 ARGIND: 0 ARGV: array, 1 elements BINMODE: 0 CONVFMT: “%.6g” ERRNO: “” FIELDWIDTHS: “” FILENAME: “” FNR: 0 FPAT: “[^[:space:]]+” FS: ” ” IGNORECASE: 0 LINT: 0 NF: 0 NR: 0 OFMT: “%.6g” OFS: ” ” ORS: “n” RLENGTH: 0 RS: “n” RSTART: 0 RT: “” SUBSEP: “34” TEXTDOMAIN: “messages” The –help option This option prints the help message on standard output. Example [jerry]$ awk –help On executing this code, you get the following result − Output Usage: awk [POSIX or GNU style options] -f progfile [–] file … Usage: awk [POSIX or GNU style options] [–] ”program” file … POSIX options : GNU long options: (standard) -f progfile –file=progfile -F fs –field-separator=fs -v var=val –assign=var=val Short options : GNU long options: (extensions) -b –characters-as-bytes -c –traditional -C –copyright -d[file] –dump-variables[=file] -e ”program-text” –source=”program-text” -E file –exec=file -g –gen-pot -h –help -L [fatal] –lint[=fatal] -n –non-decimal-data -N –use-lc-numeric -O –optimize -p[file] –profile[=file] -P –posix -r –re-interval -S –sandbox -t –lint-old -V –version The –lint[=fatal] option This option enables checking of non-portable or dubious constructs. When an argument fatal is provided, it treats warning messages as errors. The following example demonstrates this − Example [jerry]$ awk –lint ”” /bin/ls On executing this code, you get the following result − Output awk: cmd. line:1: warning: empty program text on command line awk: cmd. line:1: warning: source file does not end in newline awk: warning: no program text at all! The –posix option This option turns on strict POSIX compatibility, in which all common and gawk-specific extensions are disabled. The –profile[=file] option This option generates a pretty-printed version of the program in file. Default file is awkprof.out. Below simple example illustrates this − Example [jerry]$ awk –profile ”BEGIN{printf”—|Header|–n”} {print} END{printf”—|Footer|—n”}” marks.txt > /dev/null [jerry]$ cat awkprof.out On executing this code, you get the following result − Output # gawk profile, created Sun Oct 26 19:50:48 2014 # BEGIN block(s) BEGIN { printf “—|Header|–n” } # Rule(s) { print $0 } # END block(s) END { printf “—|Footer|—n” } The –traditional option This option disables all gawk-specific extensions. The –version option This option displays the version information of the AWK program. Example [jerry]$ awk –version When this code is executed, it produces the following result − Output GNU Awk 4.0.1 Copyright (C) 1989, 1991-2012 Free Software Foundation. Print Page Previous Next Advertisements ”;

AWK – User Defined Functions

AWK – User Defined Functions ”; Previous Next Functions are basic building blocks of a program. AWK allows us to define our own functions. A large program can be divided into functions and each function can be written/tested independently. It provides re-usability of code. Given below is the general format of a user-defined function − Syntax function function_name(argument1, argument2, …) { function body } In this syntax, the function_name is the name of the user-defined function. Function name should begin with a letter and the rest of the characters can be any combination of numbers, alphabetic characters, or underscore. AWK”s reserve words cannot be used as function names. Functions can accept multiple arguments separated by comma. Arguments are not mandatory. You can also create a user-defined function without any argument. function body consists of one or more AWK statements. Let us write two functions that calculate the minimum and the maximum number and call these functions from another function called main. The functions.awk file contains − Example # Returns minimum number function find_min(num1, num2){ if (num1 < num2) return num1 return num2 } # Returns maximum number function find_max(num1, num2){ if (num1 > num2) return num1 return num2 } # Main function function main(num1, num2){ # Find minimum number result = find_min(10, 20) print “Minimum =”, result # Find maximum number result = find_max(10, 20) print “Maximum =”, result } # Script execution starts here BEGIN { main(10, 20) } On executing this code, you get the following result − Output Minimum = 10 Maximum = 20 Print Page Previous Next Advertisements ”;

AWK – Control Flow

AWK – Control Flow ”; Previous Next Like other programming languages, AWK provides conditional statements to control the flow of a program. This chapter explains AWK”s control statements with suitable examples. If statement It simply tests the condition and performs certain actions depending upon the condition. Given below is the syntax of if statement − Syntax if (condition) action We can also use a pair of curly braces as given below to execute multiple actions − Syntax if (condition) { action-1 action-1 . . action-n } For instance, the following example checks whether a number is even or not − Example [jerry]$ awk ”BEGIN {num = 10; if (num % 2 == 0) printf “%d is even number.n”, num }” On executing the above code, you get the following result − Output 10 is even number. If Else Statement In if-else syntax, we can provide a list of actions to be performed when a condition becomes false. The syntax of if-else statement is as follows − Syntax if (condition) action-1 else action-2 In the above syntax, action-1 is performed when the condition evaluates to true and action-2 is performed when the condition evaluates to false. For instance, the following example checks whether a number is even or not − Example [jerry]$ awk ”BEGIN { num = 11; if (num % 2 == 0) printf “%d is even number.n”, num; else printf “%d is odd number.n”, num }” On executing this code, you get the following result − Output 11 is odd number. If-Else-If Ladder We can easily create an if-else-if ladder by using multiple if-else statements. The following example demonstrates this − Example [jerry]$ awk ”BEGIN { a = 30; if (a==10) print “a = 10”; else if (a == 20) print “a = 20”; else if (a == 30) print “a = 30″; }” On executing this code, you get the following result − Output a = 30 Print Page Previous Next Advertisements ”;

AWK – Regular Expressions

AWK – Regular Expressions ”; Previous Next AWK is very powerful and efficient in handling regular expressions. A number of complex tasks can be solved with simple regular expressions. Any command-line expert knows the power of regular expressions. This chapter covers standard regular expressions with suitable examples. Dot It matches any single character except the end of line character. For instance, the following example matches fin, fun, fan etc. Example [jerry]$ echo -e “catnbatnfunnfinnfan” | awk ”/f.n/” On executing the above code, you get the following result − Output fun fin fan Start of line It matches the start of line. For instance, the following example prints all the lines that start with pattern The. Example [jerry]$ echo -e “ThisnThatnTherenTheirnthese” | awk ”/^The/” On executing this code, you get the following result − Output There Their End of line It matches the end of line. For instance, the following example prints the lines that end with the letter n. Example [jerry]$ echo -e “knifenknownfunnfinnfannnine” | awk ”/n$/” Output On executing this code, you get the following result − fun fin fan Match character set It is used to match only one out of several characters. For instance, the following example matches pattern Call and Tall but not Ball. Example [jerry]$ echo -e “CallnTallnBall” | awk ”/[CT]all/” Output On executing this code, you get the following result − Call Tall Exclusive set In exclusive set, the carat negates the set of characters in the square brackets. For instance, the following example prints only Ball. Example [jerry]$ echo -e “CallnTallnBall” | awk ”/[^CT]all/” On executing this code, you get the following result − Output Ball Alteration A vertical bar allows regular expressions to be logically ORed. For instance, the following example prints Ball and Call. Example [jerry]$ echo -e “CallnTallnBallnSmallnShall” | awk ”/Call|Ball/” On executing this code, you get the following result − Output Call Ball Zero or One Occurrence It matches zero or one occurrence of the preceding character. For instance, the following example matches Colour as well as Color. We have made u as an optional character by using ?. Example [jerry]$ echo -e “ColournColor” | awk ”/Colou?r/” On executing this code, you get the following result − Output Colour Color Zero or More Occurrence It matches zero or more occurrences of the preceding character. For instance, the following example matches ca, cat, catt, and so on. Example [jerry]$ echo -e “cancatncatt” | awk ”/cat*/” On executing this code, you get the following result − Output ca cat catt One or More Occurrence It matches one or more occurrence of the preceding character. For instance below example matches one or more occurrences of the 2. Example [jerry]$ echo -e “111n22n123n234n456n222″ | awk ”/2+/” On executing the above code, you get the following result − Output 22 123 234 222 Grouping Parentheses () are used for grouping and the character | is used for alternatives. For instance, the following regular expression matches the lines containing either Apple Juice or Apple Cake. Example [jerry]$ echo -e “Apple JuicenApple PienApple TartnApple Cake” | awk ”/Apple (Juice|Cake)/” On executing this code, you get the following result − Output Apple Juice Apple Cake Print Page Previous Next Advertisements ”;