Python – Chunk Classification


Python – Chunk Classification


”;


Classification based chunking involves classifying the text as a group of words rather than individual words. A simple scenario is tagging the text in sentences. We will use a corpus to demonstrate the classification. We choose the corpus conll2000 which has data from the of the Wall Street Journal corpus (WSJ) used for noun phrase-based chunking.

First, we add the corpus to our environment using the following command.

import nltk
nltk.download(''conll2000'')

Lets have a look at the first few sentences in this corpus.

from nltk.corpus import conll2000

x = (conll2000.sents())
for i in range(3):
     print x[i]
     print ''n''

When we run the above program we get the following output −


[''Confidence'', ''in'', ''the'', ''pond'', ''is'', ''widely'', ''expected'', ''to'', ''take'', ''another'', ''sharp'', ''dive'', ''if'', ''trade'', ''figres'', ''for'', ''September'', '','', ''de'', ''for'', ''release'', ''tomorrow'', '','', ''fail'', ''to'', ''show'', ''a'', ''sbstantial'', ''improvement'', ''from'', ''Jly'', ''and'', ''Agst'', "''s", ''near-record'', ''deficits'', ''.'']


[''Chancellor'', ''of'', ''the'', ''Excheqer'', ''Nigel'', ''Lawson'', "''s", ''restated'', ''commitment'', ''to'', ''a'', ''firm'', ''monetary'', ''policy'', ''has'', ''helped'', ''to'', ''prevent'', ''a'', ''freefall'', ''in'', ''sterling'', ''over'', ''the'', ''past'', ''week'', ''.'']


[''Bt'', ''analysts'', ''reckon'', ''nderlying'', ''spport'', ''for'', ''sterling'', ''has'', ''been'', ''eroded'', ''by'', ''the'', ''chancellor'', "''s", ''failre'', ''to'', ''annonce'', ''any'', ''new'', ''policy'', ''measres'', ''in'', ''his'', ''Mansion'', ''Hose'', ''speech'', ''last'', ''Thrsday'', ''.'']

Next we use the fucntion tagged_sents() to get the sentences tagged to their classifiers.

from nltk.corpus import conll2000

x = (conll2000.tagged_sents())
for i in range(3):
     print x[i]
     print ''n''

When we run the above program we get the following output −

[(''Confidence'', ''NN''), (''in'', ''IN''), (''the'', ''DT''), (''pond'', ''NN''), (''is'', ''VBZ''), (''widely'', ''RB''), (''expected'', ''VBN''), (''to'', ''TO''), (''take'', ''VB''), (''another'', ''DT''), (''sharp'', ''JJ''), (''dive'', ''NN''), (''if'', ''IN''), (''trade'', ''NN''), (''figres'', ''NNS''), (''for'', ''IN''), (''September'', ''NNP''), ('','', '',''), (''de'', ''JJ''), (''for'', ''IN''), (''release'', ''NN''), (''tomorrow'', ''NN''), ('','', '',''), (''fail'', ''VB''), (''to'', ''TO''), (''show'', ''VB''), (''a'', ''DT''), (''sbstantial'', ''JJ''), (''improvement'', ''NN''), (''from'', ''IN''), (''Jly'', ''NNP''), (''and'', ''CC''), (''Agst'', ''NNP''), ("''s", ''POS''), (''near-record'', ''JJ''), (''deficits'', ''NNS''), (''.'', ''.'')]


[(''Chancellor'', ''NNP''), (''of'', ''IN''), (''the'', ''DT''), (''Excheqer'', ''NNP''), (''Nigel'', ''NNP''), (''Lawson'', ''NNP''), ("''s", ''POS''), (''restated'', ''VBN''), (''commitment'', ''NN''), (''to'', ''TO''), (''a'', ''DT''), (''firm'', ''NN''), (''monetary'', ''JJ''), (''policy'', ''NN''), (''has'', ''VBZ''), (''helped'', ''VBN''), (''to'', ''TO''), (''prevent'', ''VB''), (''a'', ''DT''), (''freefall'', ''NN''), (''in'', ''IN''), (''sterling'', ''NN''), (''over'', ''IN''), (''the'', ''DT''), (''past'', ''JJ''), (''week'', ''NN''), (''.'', ''.'')]


[(''Bt'', ''CC''), (''analysts'', ''NNS''), (''reckon'', ''VBP''), (''nderlying'', ''VBG''), (''spport'', ''NN''), (''for'', ''IN''), (''sterling'', ''NN''), (''has'', ''VBZ''), (''been'', ''VBN''), (''eroded'', ''VBN''), (''by'', ''IN''), (''the'', ''DT''), (''chancellor'', ''NN''), ("''s", ''POS''), (''failre'', ''NN''), (''to'', ''TO''), (''annonce'', ''VB''), (''any'', ''DT''), (''new'', ''JJ''), (''policy'', ''NN''), (''measres'', ''NNS''), (''in'', ''IN''), (''his'', ''PRP$''), (''Mansion'', ''NNP''), (''Hose'', ''NNP''), (''speech'', ''NN''), (''last'', ''JJ''), (''Thrsday'', ''NNP''), (''.'', ''.'')]

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *