Beautiful Soup – decompose Method


Beautiful Soup – decompose() Method



”;


Method Description

The decompose() method destroys current element along with its children, thus the element is removed from the tree, wiping it out and everything beneath it. You can check whether an element has been decomposed, by the `decomposed` property. It returns True if destroyed, false otherwise.

Syntax


decompose()

Parameters

No parameters are defined for this method.

Return Type

The method doesn”t return any object.

Example 1

When we call descompose() method on the BeautifulSoup object itself, the entire content will be destroyed.


html = ''''''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
''''''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
soup.decompose()
print ("decomposed:",soup.decomposed)
print (soup)

Output


decomposed: True
document: Traceback (most recent call last):
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

Since the soup object is decomposed, it returns True, however, you get TypeError as shown above.

Example 2

The code below makes use of decompose() method to remove all the occurrences of <p> tags in the HTML string used.


html = ''''''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
''''''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
p_all = soup.find_all(''p'')
[p.decompose() for p in p_all]

print ("document:",soup)

Output

Rest of the HTML document after removing all <p> tags will be printed.


document: 
<html>
<body>

</body>
</html>

Example 3

Here, we find the <body> tag from the HTML document tree and decompose the previous element which happens to be the <title> tag. The resultant document tree omits the <title> tag.


html = ''''''
<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      Hello World
   </body>
</html>

''''''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
tag = soup.body
tag.find_previous().decompose()

print ("document:",soup)

Output


document: 
<html>
<head>

</head>
<body>
Hello World
</body>
</html>

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *