”;
Method Description
The extract() method in Beautiful Soup library is used to remove a tag or a string from the document tree. The extract() method returns the object that has been removed. It is similar to how a pop() method in Python list works.
Syntax
extract(index)
Parameters
-
Index − The position of the element to be removed. None by default.
Return Type
The extract() method returns the element that has been removed from the document tree.
Example 1
html = '''''' <div> <p>Hello Python</p> </div> '''''' from bs4 import BeautifulSoup soup=BeautifulSoup(html, ''html.parser'') tag1 = soup.find("div") tag2 = tag1.find("p") ret = tag2.extract() print (''Extracted:'',ret) print (''original:'',soup)
Output
Extracted: <p>Hello Python</p> original: <div> </div>
Example 2
Consider the following HTML markup −
<html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs./p> </body> </html>
Here is the code −
from bs4 import BeautifulSoup fp = open(''index.html'') soup = BeautifulSoup(fp, ''html.parser'') tags = soup.find_all() for tag in tags: obj = tag.extract() print ("Extracted:",obj) print (soup)
Output
Extracted: <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> </html> Extracted: <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> Extracted: <p> The quick, brown fox jumps over a lazy dog.</p> Extracted: <p> DJs flock by when MTV ax quiz prog.</p> Extracted: <p> Junk MTV quiz graced by fox whelps.</p> Extracted: <p> Bawds jog, flick quartz, vex nymphs.</p>
Example 3
You can also use extract() method along with find_next(), find_previous() methods and next_element, previous_element properties.
html = '''''' <div> <p><b>Hello</b><b>Python</b></p> </div> '''''' from bs4 import BeautifulSoup soup=BeautifulSoup(html, ''html.parser'') tag1 = soup.find("b") ret = tag1.next_element.extract() print (''Extracted:'',ret) print (''original:'',soup)
Output
Extracted: Hello original: <div> <p><b></b><b>Python</b></p> </div>
Advertisements
”;