Python – Frequency Distribution


Python – Frequency Distribution


”;


Counting the frequency of occurrence of a word in a body of text is often needed during text processing. This can be achieved by applying the word_tokenize() function and appending the result to a list to keep count of the words as shown in the below program.

from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample)
wlist = []

for i in range(50):
    wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]
print("Pairsn" + str(zip(token, wordfreq)))

When we run the above program, we get the following output −

[(['', 1), (Poems'', 1), (by'', 1), (William'', 1), (Blake'', 1), (1789'', 1), (]'', 1), (SONGS'', 2), (OF'', 3), (INNOCENCE'', 2), (AND'', 1), (OF'', 3), (EXPERIENCE'', 1), (and'', 1), (THE'', 1), (BOOK'', 1), (of'', 2), (THEL'', 1), (SONGS'', 2), (OF'', 3), (INNOCENCE'', 2), (INTRODUCTION'', 1), (Piping'', 2), (down'', 1), (the'', 1), (valleys'', 1), (wild'', 1), (,'', 3), (Piping'', 2), (songs'', 1), (of'', 2), (pleasant'', 1), (glee'', 1), (,'', 3), (On'', 1), (a'', 2), (cloud'', 1), (I'', 1), (saw'', 1), (a'', 2), (child'', 1), (,'', 3), (And'', 1), (he'', 1), (laughing'', 1), (said'', 1), (to'', 1), (me'', 1), (:'', 1), (``'', 1)]

Conditional Frequency Distribution

Conditional Frequency Distribution is used when we want to count words meeting specific crteria satisfying a set of text.

import nltk
#from nltk.tokenize import word_tokenize
from nltk.corpus import brown

cfd = nltk.ConditionalFreqDist(
          (genre, word)
          for genre in brown.categories()
          for word in brown.words(categories=genre))
categories = [''hobbies'', ''romance'',''humor'']
searchwords = [ ''may'', ''might'', ''must'', ''will'']
cfd.tabulate(conditions=categories, samples=searchwords)

When we run the above program, we get the following output −

          may might  must  will 
hobbies   131    22    83   264 
romance    11    51    45    43 
  humor     8     8     9    13 

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *