Beautiful Soup – extract Method


Beautiful Soup – extract() Method



”;


Method Description

The extract() method in Beautiful Soup library is used to remove a tag or a string from the document tree. The extract() method returns the object that has been removed. It is similar to how a pop() method in Python list works.

Syntax


extract(index)

Parameters

  • Index − The position of the element to be removed. None by default.

Return Type

The extract() method returns the element that has been removed from the document tree.

Example 1


html = ''''''
   <div>
      <p>Hello Python</p>
   </div>
''''''
from bs4 import BeautifulSoup

soup=BeautifulSoup(html, ''html.parser'')
                
tag1 = soup.find("div")
tag2 = tag1.find("p")
ret = tag2.extract()
print (''Extracted:'',ret)
print (''original:'',soup)

Output


Extracted: <p>Hello Python</p>
original:
<div>
</div>

Example 2

Consider the following HTML markup −


<html>
   <body>
      <p> The quick, brown fox jumps over a lazy dog.</p>
      <p> DJs flock by when MTV ax quiz prog.</p>
      <p> Junk MTV quiz graced by fox whelps.</p>
      <p> Bawds jog, flick quartz, vex nymphs./p>
   </body>
</html>

Here is the code −


from bs4 import BeautifulSoup

fp = open(''index.html'')
soup = BeautifulSoup(fp, ''html.parser'')
tags = soup.find_all()
for tag in tags:
   obj = tag.extract()
   print ("Extracted:",obj)

print (soup)

Output


Extracted: <html>
<body>
<p> The quick, brown fox jumps over a lazy dog.</p>
<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
<p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
Extracted: <body>
<p> The quick, brown fox jumps over a lazy dog.</p>
<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
<p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
Extracted: <p> The quick, brown fox jumps over a lazy dog.</p>
Extracted: <p> DJs flock by when MTV ax quiz prog.</p>
Extracted: <p> Junk MTV quiz graced by fox whelps.</p>
Extracted: <p> Bawds jog, flick quartz, vex nymphs.</p>

Example 3

You can also use extract() method along with find_next(), find_previous() methods and next_element, previous_element properties.


html = ''''''
<div>
<p><b>Hello</b><b>Python</b></p>
</div>
''''''
from bs4 import BeautifulSoup

soup=BeautifulSoup(html, ''html.parser'')
                
tag1 = soup.find("b")
ret = tag1.next_element.extract()
print (''Extracted:'',ret)
print (''original:'',soup)

Output


Extracted: Hello
original:
<div>
<p><b></b><b>Python</b></p>
</div>

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *