Chomsky Normal Form A CFG is in Chomsky Normal Form if the Productions are in the following forms − A → a A → BC S → ε where A, B, and C are non-terminals and a is terminal. Algorithm to Convert into Chomsky Normal Form − Step 1 − If the start symbol S occurs on some right side, create a new start symbol S’ and a new production S’→ S. Step 2 − Remove Null productions. (Using the Null production removal algorithm discussed earlier) Step 3 − Remove unit productions. (Using the Unit production removal algorithm discussed earlier) Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn. Repeat this step for all productions having two or more symbols in the right side. Step 5 − If the right side of any production is in the form A → aB where a is a terminal and A, B are non-terminal, then the production is replaced by A → XB and X → a. Repeat this step for every production which is in the form A → aB. Problem Convert the following CFG into CNF S → ASA | aB, A → B | S, B → b | ε Solution (1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the production set and it becomes − S0→S, S→ ASA | aB, A → B | S, B → b | ∈ (2) Now we will remove the null productions − B → ∈ and A → ∈ After removing B → ε, the production set becomes − S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b After removing A → ∈, the production set becomes − S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b (3) Now we will remove the unit productions. After removing S → S, the production set becomes − S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b After removing S0→ S, the production set becomes − S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → B | S, B → b After removing A→ B, the production set becomes − S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → S | b B → b After removing A→ S, the production set becomes − S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA A → b |ASA | aB | a | AS | SA, B → b (4) Now we will find out more than two variables in the R.H.S Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S. Hence we will apply step 4 and step 5 to get the following final production set which is in CNF − S0→ AX | aB | a | AS | SA S→ AX | aB | a | AS | SA A → b |AX | aB | a | AS | SA B → b X → SA (5) We have to change the productions S0→ aB, S→ aB, A→ aB And the final production set becomes − S0→ AX | YB | a | AS | SA S→ AX | YB | a | AS | SA A → b A → b |AX | YB | a | AS | SA B → b X → SA Y → a Learning working make money
Category: automata Theory
Multi-tape Turing Machine Multi-tape Turing Machines have multiple tapes where each tape is accessed with a separate head. Each head can move independently of the other heads. Initially the input is on tape 1 and others are blank. At first, the first tape is occupied by the input and the other tapes are kept blank. Next, the machine reads consecutive symbols under its heads and the TM prints a symbol on each tape and moves its heads. A Multi-tape Turing machine can be formally described as a 6-tuple (Q, X, B, δ, q0, F) where − Q is a finite set of states X is the tape alphabet B is the blank symbol δ is a relation on states and symbols where δ: Q × Xk → Q × (X × {Left_shift, Right_shift, No_shift })k where there is k number of tapes q0 is the initial state F is the set of final states Note − Every Multi-tape Turing machine has an equivalent single-tape Turing machine. Learning working make money
Ambiguity in Context-Free Grammars If a context free grammar G has more than one derivation tree for some string w ∈ L(G), it is called an ambiguous grammar. There exist multiple right-most or left-most derivations for some string generated from that grammar. Problem Check whether the grammar G with production rules − X → X+X | X*X |X| a is ambiguous or not. Solution Let’s find out the derivation tree for the string “a+a*a”. It has two leftmost derivations. Derivation 1 − X → X+X → a +X → a+ X*X → a+a*X → a+a*a Parse tree 1 − Derivation 2 − X → X*X → X+X*X → a+ X*X → a+a*X → a+a*a Parse tree 2 − Since there are two parse trees for a single string “a+a*a”, the grammar G is ambiguous. Learning working make money
CFL Closure Property Context-free languages are closed under − Union Concatenation Kleene Star operation Union Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also context free. Example Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 → aAb|ab Let L2 = { cmdm , m ≥ 0}. Corresponding grammar G2 will have P: S2 → cBb| ε Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ { cmdm } The corresponding grammar G will have the additional production S → S1 | S2 Concatenation If L1 and L2 are context free languages, then L1L2 is also context free. Example Union of the languages L1 and L2, L = L1L2 = { anbncmdm } The corresponding grammar G will have the additional production S → S1 S2 Kleene Star If L is a context free language, then L* is also context free. Example Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb| ε Kleene Star L1 = { anbn }* The corresponding grammar G1 will have additional productions S1 → SS1 | ε Context-free languages are not closed under − Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is not necessarily context free. Intersection with Regular Language − If L1 is a regular language and L2 is a context free language, then L1 ∩ L2 is a context free language. Complement − If L1 is a context free language, then L1’ may not be context free. Learning working make money
Introduction to Grammars n the literary sense of the term, grammars denote syntactical rules for conversation in natural languages. Linguistics have attempted to define grammars since the inception of natural languages like English, Sanskrit, Mandarin, etc. The theory of formal languages finds its applicability extensively in the fields of Computer Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective for writing computer languages. Grammar A grammar G can be formally written as a 4-tuple (N, T, S, P) where − N or VN is a set of variables or non-terminal symbols. T or ∑ is a set of Terminal symbols. S is a special variable called the Start symbol, S ∈ N P is Production rules for Terminals and Non-terminals. A production rule has the form α → β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN. Example Grammar G1 − ({S, A, B}, {a, b}, S, {S → AB, A → a, B → b}) Here, S, A, and B are Non-terminal symbols; a and b are Terminal symbols S is the Start symbol, S ∈ N Productions, P : S → AB, A → a, B → b Example Grammar G2 − (({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } ) Here, S and A are Non-terminal symbols. a and b are Terminal symbols. ε is an empty string. S is the Start symbol, S ∈ N Production P : S → aAb, aA → aaAb, A → ε Derivations from a Grammar Strings may be derived from other strings using the productions in a grammar. If a grammar G has a production α → β, we can say that x α y derives x β y in G. This derivation is written as − x α y ⇒G x β y Example Let us consider the grammar − G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } ) Some of the strings that can be derived are − S ⇒ aAb using production S → aAb ⇒ aaAbb using production aA → aAb ⇒ aaaAbbb using production aA → aaAb ⇒ aaabbb using production A → ε Learning working make money
Rice Theorem Rice theorem states that any non-trivial semantic property of a language which is recognized by a Turing machine is undecidable. A property, P, is the language of all Turing machines that satisfy that property. Formal Definition If P is a non-trivial property, and the language holding the property, Lp , is recognized by Turing machine M, then Lp = {<M> | L(M) ∈ P} is undecidable. Description and Properties Property of languages, P, is simply a set of languages. If any language belongs to P (L ∈ P), it is said that L satisfies the property P. A property is called to be trivial if either it is not satisfied by any recursively enumerable languages, or if it is satisfied by all recursively enumerable languages. A non-trivial property is satisfied by some recursively enumerable languages and are not satisfied by others. Formally speaking, in a non-trivial property, where L ∈ P, both the following properties hold: Property 1 − There exists Turing Machines, M1 and M2 that recognize the same language, i.e. either ( <M1>, <M2> ∈ L ) or ( <M1>,<M2> ∉ L ) Property 2 − There exists Turing Machines M1 and M2, where M1 recognizes the language while M2 does not, i.e. <M1> ∈ L and <M2> ∉ L Proof Suppose, a property P is non-trivial and φ ∈ P. Since, P is non-trivial, at least one language satisfies P, i.e., L(M0) ∈ P , ∋ Turing Machine M0. Let, w be an input in a particular instant and N is a Turing Machine which follows − On input x Run M on w If M does not accept (or doesn”t halt), then do not accept x (or do not halt) If M accepts w then run M0 on x. If M0 accepts x, then accept x. A function that maps an instance ATM = {<M,w>| M accepts input w} to a N such that If M accepts w and N accepts the same language as M0, Then L(M) = L(M0) ∈ p If M does not accept w and N accepts φ, Then L(N) = φ ∉ p Since ATM is undecidable and it can be reduced to Lp, Lp is also undecidable. Learning working make money
Pushdown Automata Acceptance There are two different ways to define PDA acceptability. Final State Acceptability In final state acceptability, a PDA accepts a string when, after reading the entire string, the PDA is in a final state. From the starting state, we can make moves that end up in a final state with any stack values. The stack values are irrelevant as long as we end up in a final state. For a PDA (Q, ∑, S, δ, q0, I, F), the language accepted by the set of final states F is − L(PDA) = {w | (q0, w, I) ⊢* (q, ε, x), q ∈ F} for any input stack string x. Empty Stack Acceptability Here a PDA accepts a string when, after reading the entire string, the PDA has emptied its stack. For a PDA (Q, ∑, S, δ, q0, I, F), the language accepted by the empty stack is − L(PDA) = {w | (q0, w, I) ⊢* (q, ε, ε), q ∈ Q} Example Construct a PDA that accepts L = {0n 1n | n ≥ 0} Solution This language accepts L = {ε, 01, 0011, 000111, ……………………….. } Here, in this example, the number of ‘a’ and ‘b’ have to be same. Initially we put a special symbol ‘$’ into the empty stack. Then at state q2, if we encounter input 0 and top is Null, we push 0 into stack. This may iterate. And if we encounter input 1 and top is 0, we pop this 0. Then at state q3, if we encounter input 1 and top is 0, we pop this 0. This may also iterate. And if we encounter input 1 and top is 0, we pop the top element. If the special symbol ‘$’ is encountered at top of the stack, it is popped out and it finally goes to the accepting state q4. Example Construct a PDA that accepts L = { wwR | w = (a+b)* } Solution Initially we put a special symbol ‘$’ into the empty stack. At state q2, the w is being read. In state q3, each 0 or 1 is popped when it matches the input. If any other input is given, the PDA will go to a dead state. When we reach that special symbol ‘$’, we go to the accepting state q4. Learning working make money
Context-Free Grammar Introduction Definition − A context-free grammar (CFG) consisting of a finite set of grammar rules is a quadruple (N, T, P, S) where N is a set of non-terminal symbols. T is a set of terminals where N ∩ T = NULL. P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production rule P does have any right context or left context. S is the start symbol. Example The grammar ({A}, {a, b, c}, P, A), P : A → aA, A → abc. The grammar ({S, a, b}, {a, b}, P, S), P: S → aSa, S → bSb, S → ε The grammar ({S, F}, {0, 1}, P, S), P: S → 00S | 11F, F → 00F | ε Generation of Derivation Tree A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information a string derived from a context-free grammar. Representation Technique Root vertex − Must be labeled by the start symbol. Vertex − Labeled by a non-terminal symbol. Leaves − Labeled by a terminal symbol or ε. If S → x1x2 …… xn is a production rule in a CFG, then the parse tree / derivation tree will be as follows − There are two different approaches to draw a derivation tree − Top-down Approach − Starts with the starting symbol S Goes down to tree leaves using productions Bottom-up Approach − Starts from tree leaves Proceeds upward to the root which is the starting symbol S Derivation or Yield of a Tree The derivation or the yield of a parse tree is the final string obtained by concatenating the labels of the leaves of the tree from left to right, ignoring the Nulls. However, if all the leaves are Null, derivation is Null. Example Let a CFG {N,T,P,S} be N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε One derivation from the above CFG is “abaabb” S → SS → aSbS → abS → abaSb → abaaSbb → abaabb Sentential Form and Partial Derivation Tree A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all of its children are in the sub-tree or none of them are in the sub-tree. Example If in any CFG the productions are − S → AB, A → aaA | ε, B → Bb| ε the partial derivation tree can be the following − If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree is also in sentential form. Leftmost and Rightmost Derivation of a String Leftmost derivation − A leftmost derivation is obtained by applying production to the leftmost variable in each step. Rightmost derivation − A rightmost derivation is obtained by applying production to the rightmost variable in each step. Example Let any set of production rules in a CFG be X → X+X | X*X |X| a over an alphabet {a}. The leftmost derivation for the string “a+a*a” may be − X → X+X → a+X → a + X*X → a+a*X → a+a*a The stepwise derivation of the above string is shown as below − The rightmost derivation for the above string “a+a*a” may be − X → X*X → X*a → X+X*a → X+a*a → a+a*a The stepwise derivation of the above string is shown as below − Left and Right Recursive Grammars In a context-free grammar G, if there is a production in the form X → Xa where X is a non-terminal and ‘a’ is a string of terminals, it is called a left recursive production. The grammar having a left recursive production is called a left recursive grammar. And if in a context-free grammar G, if there is a production is in the form X → aX where X is a non-terminal and ‘a’ is a string of terminals, it is called a right recursive production. The grammar having a right recursive production is called a right recursive grammar. Learning working make money
Arden”s Theorem In order to find out a regular expression of a Finite Automaton, we use Arden’s Theorem along with the properties of regular expressions. Statement − Let P and Q be two regular expressions. If P does not contain null string, then R = Q + RP has a unique solution that is R = QP* Proof − R = Q + (Q + RP)P [After putting the value R = Q + RP] = Q + QP + RPP When we put the value of R recursively again and again, we get the following equation − R = Q + QP + QP2 + QP3….. R = Q (ε + P + P2 + P3 + …. ) R = QP* [As P* represents (ε + P + P2 + P3 + ….) ] Hence, proved. Assumptions for Applying Arden’s Theorem The transition diagram must not have NULL transitions It must have only one initial state Method Step 1 − Create equations as the following form for all the states of the DFA having n states with initial state q1. q1 = q1R11 + q2R21 + … + qnRn1 + ε q2 = q1R12 + q2R22 + … + qnRn2 ………………………… ………………………… ………………………… ………………………… qn = q1R1n + q2R2n + … + qnRnn Rij represents the set of labels of edges from qi to qj, if no such edge exists, then Rij = ∅ Step 2 − Solve these equations to get the equation for the final state in terms of Rij Problem Construct a regular expression corresponding to the automata given below − Solution − Here the initial state and final state is q1. The equations for the three states q1, q2, and q3 are as follows − q1 = q1a + q3a + ε (ε move is because q1 is the initial state0 q2 = q1b + q2b + q3b q3 = q2a Now, we will solve these three equations − q2 = q1b + q2b + q3b = q1b + q2b + (q2a)b (Substituting value of q3) = q1b + q2(b + ab) = q1b (b + ab)* (Applying Arden’s Theorem) q1 = q1a + q3a + ε = q1a + q2aa + ε (Substituting value of q3) = q1a + q1b(b + ab*)aa + ε (Substituting value of q2) = q1(a + b(b + ab)*aa) + ε = ε (a+ b(b + ab)*aa)* = (a + b(b + ab)*aa)* Hence, the regular expression is (a + b(b + ab)*aa)*. Problem Construct a regular expression corresponding to the automata given below − Solution − Here the initial state is q1 and the final state is q2 Now we write down the equations − q1 = q10 + ε q2 = q11 + q20 q3 = q21 + q30 + q31 Now, we will solve these three equations − q1 = ε0* [As, εR = R] So, q1 = 0* q2 = 0*1 + q20 So, q2 = 0*1(0)* [By Arden’s theorem] Hence, the regular expression is 0*10*. Learning working make money
Deterministic Finite Automaton Finite Automaton can be classified into two types − Deterministic Finite Automaton (DFA) Non-deterministic Finite Automaton (NDFA / NFA) Deterministic Finite Automaton (DFA) In DFA, for each input symbol, one can determine the state to which the machine will move. Hence, it is called Deterministic Automaton. As it has a finite number of states, the machine is called Deterministic Finite Machine or Deterministic Finite Automaton. Formal Definition of a DFA A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where − Q is a finite set of states. ∑ is a finite set of symbols called the alphabet. δ is the transition function where δ: Q × ∑ → Q q0 is the initial state from where any input is processed (q0 ∈ Q). F is a set of final state/states of Q (F ⊆ Q). Graphical Representation of a DFA A DFA is represented by digraphs called state diagram. The vertices represent the states. The arcs labeled with an input alphabet show the transitions. The initial state is denoted by an empty single incoming arc. The final state is indicated by double circles. Example Let a deterministic finite automaton be → Q = {a, b, c}, ∑ = {0, 1}, q0 = {a}, F = {c}, and Transition function δ as shown by the following table − Present State Next State for Input 0 Next State for Input 1 a a b b c a c b c Its graphical representation would be as follows − Learning working make money