Pumping Lemma For Regular Grammars Theorem Let L be a regular language. Then there exists a constant ‘c’ such that for every string w in L − |w| ≥ c We can break w into three strings, w = xyz, such that − |y| > 0 |xy| ≤ c For all k ≥ 0, the string xykz is also in L. Applications of Pumping Lemma Pumping Lemma is to be applied to show that certain languages are not regular. It should never be used to show a language is regular. If L is regular, it satisfies Pumping Lemma. If L does not satisfy Pumping Lemma, it is non-regular. Method to prove that a language L is not regular At first, we have to assume that L is regular. So, the pumping lemma should hold for L. Use the pumping lemma to obtain a contradiction − Select w such that |w| ≥ c Select y such that |y| ≥ 1 Select x such that |xy| ≤ c Assign the remaining string to z. Select k such that the resulting string is not in L. Hence L is not regular. Problem Prove that L = {aibi | i ≥ 0} is not regular. Solution − At first, we assume that L is regular and n is the number of states. Let w = anbn. Thus |w| = 2n ≥ n. By pumping lemma, let w = xyz, where |xy| ≤ n. Let x = ap, y = aq, and z = arbn, where p + q + r = n, p ≠ 0, q ≠ 0, r ≠ 0. Thus |y| ≠ 0. Let k = 2. Then xy2z = apa2qarbn. Number of as = (p + 2q + r) = (p + q + r) + q = n + q Hence, xy2z = an+q bn. Since q ≠ 0, xy2z is not of the form anbn. Thus, xy2z is not in L. Hence L is not regular. Learning working make money
Category: automata Theory
Regular Sets Any set that represents the value of the Regular Expression is called a Regular Set. Properties of Regular Sets Property 1. The union of two regular set is regular. Proof − Let us take two regular expressions RE1 = a(aa)* and RE2 = (aa)* So, L1 = {a, aaa, aaaaa,…..} (Strings of odd length excluding Null) and L2 ={ ε, aa, aaaa, aaaaaa,…….} (Strings of even length including Null) L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,…….} (Strings of all possible lengths including Null) RE (L1 ∪ L2) = a* (which is a regular expression itself) Hence, proved. Property 2. The intersection of two regular set is regular. Proof − Let us take two regular expressions RE1 = a(a*) and RE2 = (aa)* So, L1 = { a,aa, aaa, aaaa, ….} (Strings of all possible lengths excluding Null) L2 = { ε, aa, aaaa, aaaaaa,…….} (Strings of even length including Null) L1 ∩ L2 = { aa, aaaa, aaaaaa,…….} (Strings of even length excluding Null) RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself. Hence, proved. Property 3. The complement of a regular set is regular. Proof − Let us take a regular expression − RE = (aa)* So, L = {ε, aa, aaaa, aaaaaa, …….} (Strings of even length including Null) Complement of L is all the strings that is not in L. So, L’ = {a, aaa, aaaaa, …..} (Strings of odd length excluding Null) RE (L’) = a(aa)* which is a regular expression itself. Hence, proved. Property 4. The difference of two regular set is regular. Proof − Let us take two regular expressions − RE1 = a (a*) and RE2 = (aa)* So, L1 = {a, aa, aaa, aaaa, ….} (Strings of all possible lengths excluding Null) L2 = { ε, aa, aaaa, aaaaaa,…….} (Strings of even length including Null) L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ….} (Strings of all odd lengths excluding Null) RE (L1 – L2) = a (aa)* which is a regular expression. Hence, proved. Property 5. The reversal of a regular set is regular. Proof − We have to prove LR is also regular if L is a regular set. Let, L = {01, 10, 11, 10} RE (L) = 01 + 10 + 11 + 10 LR = {10, 01, 11, 01} RE (LR) = 01 + 10 + 11 + 10 which is regular Hence, proved. Property 6. The closure of a regular set is regular. Proof − If L = {a, aaa, aaaaa, …….} (Strings of odd length excluding Null) i.e., RE (L) = a (aa)* L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null) RE (L*) = a (a)* Hence, proved. Property 7. The concatenation of two regular sets is regular. Proof − Let RE1 = (0+1)*0 and RE2 = 01(0+1)* Here, L1 = {0, 00, 10, 000, 010, ……} (Set of strings ending in 0) and L2 = {01, 010,011,…..} (Set of strings beginning with 01) Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,………….} Set of strings containing 001 as a substring which can be represented by an RE − (0 + 1)*001(0 + 1)* Hence, proved. Identities Related to Regular Expressions Given R, P, L, Q as regular expressions, the following identities hold − ∅* = ε ε* = ε RR* = R*R R*R* = R* (R*)* = R* RR* = R*R (PQ)*P =P(QP)* (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)* R + ∅ = ∅ + R = R (The identity for union) R ε = ε R = R (The identity for concatenation) ∅ L = L ∅ = ∅ (The annihilator for concatenation) R + R = R (Idempotent law) L (M + N) = LM + LN (Left distributive law) (M + N) L = ML + NL (Right distributive law) ε + RR* = ε + R*R = R* Learning working make money
Semi-Infinite Tape Turing Machine A Turing Machine with a semi-infinite tape has a left end but no right end. The left end is limited with an end marker. It is a two-track tape − Upper track − It represents the cells to the right of the initial head position. Lower track − It represents the cells to the left of the initial head position in reverse order. The infinite length input string is initially written on the tape in contiguous tape cells. The machine starts from the initial state q0 and the head scans from the left end marker ‘End’. In each step, it reads the symbol on the tape under its head. It writes a new symbol on that tape cell and then it moves the head either into left or right one tape cell. A transition function determines the actions to be taken. It has two special states called accept state and reject state. If at any point of time it enters into the accepted state, the input is accepted and if it enters into the reject state, the input is rejected by the TM. In some cases, it continues to run infinitely without being accepted or rejected for some certain input symbols. Note − Turing machines with semi-infinite tape are equivalent to standard Turing machines. Learning working make money
Construction of an FA from an RE We can use Thompson”s Construction to find out a Finite Automaton from a Regular Expression. We will reduce the regular expression into smallest regular expressions and converting these to NFA and finally to DFA. Some basic RA expressions are the following − Case 1 − For a regular expression ‘a’, we can construct the following FA − Case 2 − For a regular expression ‘ab’, we can construct the following FA − Case 3 − For a regular expression (a+b), we can construct the following FA − Case 4 − For a regular expression (a+b)*, we can construct the following FA − Method Step 1 Construct an NFA with Null moves from the given regular expression. Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA. Problem Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0 Solution We will concatenate three expressions “1”, “(0 + 1)*” and “0” Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get the following − It is an NDFA corresponding to the RE − 1 (0 + 1)* 0. If you want to convert it into a DFA, simply apply the method of converting NDFA to DFA discussed in Chapter 1. Finite Automata with Null Moves (NFA-ε) A Finite Automaton with null moves (FA-ε) does transit not only after giving input from the alphabet set but also without any input symbol. This transition without input is called a null move. An NFA-ε is represented formally by a 5-tuple (Q, ∑, δ, q0, F), consisting of Q − a finite set of states ∑ − a finite set of input symbols δ − a transition function δ : Q × (∑ ∪ {ε}) → 2Q q0 − an initial state q0 ∈ Q F − a set of final state/states of Q (F⊆Q). The above (FA-ε) accepts a string set − {0, 1, 01} Removal of Null Moves from Finite Automata If in an NDFA, there is ϵ-move between vertex X to vertex Y, we can remove it using the following steps − Find all the outgoing edges from Y. Copy all these edges starting from X without changing the edge labels. If X is an initial state, make Y also an initial state. If Y is a final state, make X also a final state. Problem Convert the following NFA-ε to NFA without Null move. Solution Step 1 − Here the ε transition is between q1 and q2, so let q1 is X and qf is Y. Here the outgoing edges from qf is to qf for inputs 0 and 1. Step 2 − Now we will Copy all these edges from q1 without changing the edges from qf and get the following FA − Step 3 − Here q1 is an initial state, so we make qf also an initial state. So the FA becomes − Step 4 − Here qf is a final state, so we make q1 also a final state. So the FA becomes − Learning working make money
Chomsky Normal Form A CFG is in Chomsky Normal Form if the Productions are in the following forms − A → a A → BC S → ε where A, B, and C are non-terminals and a is terminal. Algorithm to Convert into Chomsky Normal Form − Step 1 − If the start symbol S occurs on some right side, create a new start symbol S’ and a new production S’→ S. Step 2 − Remove Null productions. (Using the Null production removal algorithm discussed earlier) Step 3 − Remove unit productions. (Using the Unit production removal algorithm discussed earlier) Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn. Repeat this step for all productions having two or more symbols in the right side. Step 5 − If the right side of any production is in the form A → aB where a is a terminal and A, B are non-terminal, then the production is replaced by A → XB and X → a. Repeat this step for every production which is in the form A → aB. Problem Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ε Solution (1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the production set and it becomes − S0→S, S→ ASA | aB, A → B | S, B → b | ∈ (2) Now we will remove the null productions − B → ∈ and A → ∈ After removing B → ε, the production set becomes − S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b After removing A → ∈, the production set becomes − S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b (3) Now we will remove the unit productions. After removing S → S, the production set becomes − S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b After removing S0→ S, the production set becomes − S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → B | S, B → b After removing A→ B, the production set becomes − S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → S | b B → b After removing A→ S, the production set becomes − S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → b |ASA | aB | a | AS | SA, B → b (4) Now we will find out more than two variables in the R.H.S Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S. Hence we will apply step 4 and step 5 to get the following final production set which is in CNF − S0→ AX | aB | a | AS | SA S→ AX | aB | a | AS | SA A → b |AX | aB | a | AS | SA B → b X → SA (5) We have to change the productions S0→ aB, S→ aB, A→ aB And the final production set becomes − S0→ AX | YB | a | AS | SA S→ AX | YB | a | AS | SA A → b A → b |AX | YB | a | AS | SA B → b X → SA Y → a Learning working make money
Multi-tape Turing Machine Multi-tape Turing Machines have multiple tapes where each tape is accessed with a separate head. Each head can move independently of the other heads. Initially the input is on tape 1 and others are blank. At first, the first tape is occupied by the input and the other tapes are kept blank. Next, the machine reads consecutive symbols under its heads and the TM prints a symbol on each tape and moves its heads. A Multi-tape Turing machine can be formally described as a 6-tuple (Q, X, B, δ, q0, F) where − Q is a finite set of states X is the tape alphabet B is the blank symbol δ is a relation on states and symbols where δ: Q × Xk → Q × (X × {Left_shift, Right_shift, No_shift })k where there is k number of tapes q0 is the initial state F is the set of final states Note − Every Multi-tape Turing machine has an equivalent single-tape Turing machine. Learning working make money
Ambiguity in Context-Free Grammars If a context free grammar G has more than one derivation tree for some string w ∈ L(G), it is called an ambiguous grammar. There exist multiple right-most or left-most derivations for some string generated from that grammar. Problem Check whether the grammar G with production rules − X → X+X | X*X |X| a is ambiguous or not. Solution Let’s find out the derivation tree for the string “a+a*a”. It has two leftmost derivations. Derivation 1 − X → X+X → a +X → a+ X*X → a+a*X → a+a*a Parse tree 1 − Derivation 2 − X → X*X → X+X*X → a+ X*X → a+a*X → a+a*a Parse tree 2 − Since there are two parse trees for a single string “a+a*a”, the grammar G is ambiguous. Learning working make money
Language Generated by a Grammar The set of all strings that can be derived from a grammar is said to be the language generated from that grammar. A language generated by a grammar G is a subset formally defined by L(G)={W|W ∈ ∑*, S ⇒G W} If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2. Example If there is a grammar G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b} Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted string is ab, i.e., L(G) = {ab} Example Suppose we have the following grammar − G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b} The language generated by this grammar − L(G) = {ab, a2b, ab2, a2b2, ………} = {am bn | m ≥ 1 and n ≥ 1} Construction of a Grammar Generating a Language We’ll consider some languages and convert it into a grammar G which produces those languages. Example Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the grammar G which produces L(G). Solution Since L(G) = {am bn | m ≥ 0 and n > 0} the set of strings accepted can be rewritten as − L(G) = {b, ab,bb, aab, abb, …….} Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including null. To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions − S → aS , S → B, B → b and B → bB S → B → b (Accepted) S → B → bB → bb (Accepted) S → aS → aB → ab (Accepted) S → aS → aaS → aaB → aab(Accepted) S → aS → aB → abB → abb (Accepted) Thus, we can prove every single string in L(G) is accepted by the language generated by the production set. Hence the grammar − G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB }) Example Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar G which produces L(G). Solution − Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as − L(G) = {a, aa, ab, aaa, aab ,abb, …….} Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’ including null. To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions − S → aA, A → aA , A → B, B → bB ,B → λ S → aA → aB → aλ → a (Accepted) S → aA → aaA → aaB → aaλ → aa (Accepted) S → aA → aB → abB → abλ → ab (Accepted) S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted) S → aA → aaA → aaB → aabB → aabλ → aab (Accepted) S → aA → aB → abB → abbB → abbλ → abb (Accepted) Thus, we can prove every single string in L(G) is accepted by the language generated by the production set. Hence the grammar − G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB }) Learning working make money
Context-Free Grammar Introduction Definition − A context-free grammar (CFG) consisting of a finite set of grammar rules is a quadruple (N, T, P, S) where N is a set of non-terminal symbols. T is a set of terminals where N ∩ T = NULL. P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production rule P does have any right context or left context. S is the start symbol. Example The grammar ({A}, {a, b, c}, P, A), P : A → aA, A → abc. The grammar ({S, a, b}, {a, b}, P, S), P: S → aSa, S → bSb, S → ε The grammar ({S, F}, {0, 1}, P, S), P: S → 00S | 11F, F → 00F | ε Generation of Derivation Tree A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information a string derived from a context-free grammar. Representation Technique Root vertex − Must be labeled by the start symbol. Vertex − Labeled by a non-terminal symbol. Leaves − Labeled by a terminal symbol or ε. If S → x1x2 …… xn is a production rule in a CFG, then the parse tree / derivation tree will be as follows − There are two different approaches to draw a derivation tree − Top-down Approach − Starts with the starting symbol S Goes down to tree leaves using productions Bottom-up Approach − Starts from tree leaves Proceeds upward to the root which is the starting symbol S Derivation or Yield of a Tree The derivation or the yield of a parse tree is the final string obtained by concatenating the labels of the leaves of the tree from left to right, ignoring the Nulls. However, if all the leaves are Null, derivation is Null. Example Let a CFG {N,T,P,S} be N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε One derivation from the above CFG is “abaabb” S → SS → aSbS → abS → abaSb → abaaSbb → abaabb Sentential Form and Partial Derivation Tree A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all of its children are in the sub-tree or none of them are in the sub-tree. Example If in any CFG the productions are − S → AB, A → aaA | ε, B → Bb| ε the partial derivation tree can be the following − If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree is also in sentential form. Leftmost and Rightmost Derivation of a String Leftmost derivation − A leftmost derivation is obtained by applying production to the leftmost variable in each step. Rightmost derivation − A rightmost derivation is obtained by applying production to the rightmost variable in each step. Example Let any set of production rules in a CFG be X → X+X | X*X |X| a over an alphabet {a}. The leftmost derivation for the string “a+a*a” may be − X → X+X → a+X → a + X*X → a+a*X → a+a*a The stepwise derivation of the above string is shown as below − The rightmost derivation for the above string “a+a*a” may be − X → X*X → X*a → X+X*a → X+a*a → a+a*a The stepwise derivation of the above string is shown as below − Left and Right Recursive Grammars In a context-free grammar G, if there is a production in the form X → Xa where X is a non-terminal and ‘a’ is a string of terminals, it is called a left recursive production. The grammar having a left recursive production is called a left recursive grammar. And if in a context-free grammar G, if there is a production is in the form X → aX where X is a non-terminal and ‘a’ is a string of terminals, it is called a right recursive production. The grammar having a right recursive production is called a right recursive grammar. Learning working make money
Arden”s Theorem In order to find out a regular expression of a Finite Automaton, we use Arden’s Theorem along with the properties of regular expressions. Statement − Let P and Q be two regular expressions. If P does not contain null string, then R = Q + RP has a unique solution that is R = QP* Proof − R = Q + (Q + RP)P [After putting the value R = Q + RP] = Q + QP + RPP When we put the value of R recursively again and again, we get the following equation − R = Q + QP + QP2 + QP3….. R = Q (ε + P + P2 + P3 + …. ) R = QP* [As P* represents (ε + P + P2 + P3 + ….) ] Hence, proved. Assumptions for Applying Arden’s Theorem The transition diagram must not have NULL transitions It must have only one initial state Method Step 1 − Create equations as the following form for all the states of the DFA having n states with initial state q1. q1 = q1R11 + q2R21 + … + qnRn1 + ε q2 = q1R12 + q2R22 + … + qnRn2 ………………………… ………………………… ………………………… ………………………… qn = q1R1n + q2R2n + … + qnRnn Rij represents the set of labels of edges from qi to qj, if no such edge exists, then Rij = ∅ Step 2 − Solve these equations to get the equation for the final state in terms of Rij Problem Construct a regular expression corresponding to the automata given below − Solution − Here the initial state and final state is q1. The equations for the three states q1, q2, and q3 are as follows − q1 = q1a + q3a + ε (ε move is because q1 is the initial state0 q2 = q1b + q2b + q3b q3 = q2a Now, we will solve these three equations − q2 = q1b + q2b + q3b = q1b + q2b + (q2a)b (Substituting value of q3) = q1b + q2(b + ab) = q1b (b + ab)* (Applying Arden’s Theorem) q1 = q1a + q3a + ε = q1a + q2aa + ε (Substituting value of q3) = q1a + q1b(b + ab*)aa + ε (Substituting value of q2) = q1(a + b(b + ab)*aa) + ε = ε (a+ b(b + ab)*aa)* = (a + b(b + ab)*aa)* Hence, the regular expression is (a + b(b + ab)*aa)*. Problem Construct a regular expression corresponding to the automata given below − Solution − Here the initial state is q1 and the final state is q2 Now we write down the equations − q1 = q10 + ε q2 = q11 + q20 q3 = q21 + q30 + q31 Now, we will solve these three equations − q1 = ε0* [As, εR = R] So, q1 = 0* q2 = 0*1 + q20 So, q2 = 0*1(0)* [By Arden’s theorem] Hence, the regular expression is 0*10*. Learning working make money