”;
One of the powerful features of Beautiful Soup library is to be able to be able to manipulate the parsed HTML or XML document and modify its contents.
Beautiful Soup library has different functions to perform the following operations −
-
Add contents or a new tag to an existing tag of the document
-
Insert contents before or after an existing tag or string
-
Clear the contents of an already existing tag
-
Modify the contents of a tag element
Add content
You can add to the content of an existing tag by using append() method on a Tag object. It works like the append() method of Python”s list object.
In the following example, the HTML script has a <p> tag. With append(), additional text is appended.
Example
from bs4 import BeautifulSoup markup = ''<p>Hello</p>'' soup = BeautifulSoup(markup, ''html.parser'') print (soup) tag = soup.p tag.append(" World") print (soup)
Output
<p>Hello</p> <p>Hello World</p>
With the append() method, you can add a new tag at the end of an existing tag. First create a new Tag object with new_tag() method and then pass it to the append() method.
Example
from bs4 import BeautifulSoup, Tag markup = ''<b>Hello</b>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.b tag1 = soup.new_tag(''i'') tag1.string = ''World'' tag.append(tag1) print (soup.prettify())
Output
<b> Hello <i> World </i> </b>
If you have to add a string to the document, you can append a NavigableString object.
Example
from bs4 import BeautifulSoup, NavigableString markup = ''<b>Hello</b>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.b new_string = NavigableString(" World") tag.append(new_string) print (soup.prettify())
Output
<b> Hello World </b>
From Beautiful Soup version 4.7 onwards, the extend() method has been added to Tag class. It adds all the elements in a list to the tag.
Example
from bs4 import BeautifulSoup markup = ''<b>Hello</b>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.b vals = [''World.'', ''Welcome to '', ''TutorialsPoint''] tag.extend(vals) print (soup.prettify())
Output
<b> Hello World. Welcome to TutorialsPoint </b>
Insert Contents
Instead of adding a new element at the end, you can use insert() method to add an element at the given position in a the list of children of a Tag element. The insert() method in Beautiful Soup behaves similar to insert() on a Python list object.
In the following example, a new string is added to the <b> tag at position 1. The resultant parsed document shows the result.
Example
from bs4 import BeautifulSoup, NavigableString markup = ''<b>Excellent </b><u>from TutorialsPoint</u>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.b tag.insert(1, "Tutorial ") print (soup.prettify())
Output
<b> Excellent Tutorial </b> <u> from TutorialsPoint </u>
Beautiful Soup also has insert_before() and insert_after() methods. Their respective purpose is to insert a tag or a string before or after a given Tag object. The following code shows that a string “Python Tutorial” is added after the <b> tag.
Example
from bs4 import BeautifulSoup, NavigableString markup = ''<b>Excellent </b><u>from TutorialsPoint</u>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.b tag.insert_after("Python Tutorial") print (soup.prettify())
Output
<b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u>
On the other hand, insert_before() method is used below, to add “Here is an ” text before the <b> tag.
tag.insert_before("Here is an ") print (soup.prettify())
Output
Here is an <b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u>
Clear the Contents
Beautiful Soup provides more than one ways to remove contents of an element from the document tree. Each of these methods has its unique features.
The clear() method is the most straight-forward. It simply removes the contents of a specified Tag element. Following example shows its usage.
Example
from bs4 import BeautifulSoup, NavigableString markup = ''<b>Excellent </b><u>from TutorialsPoint</u>'' soup = BeautifulSoup(markup, ''html.parser'') tag = soup.find(''u'') tag.clear() print (soup.prettify())
Output
<b> Excellent </b> <u> </u>
It can be seen that the clear() method removes the contents, keeping the tag intact.
For the following example, we parse the following HTML document and call clear() metho on all tags.
<html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs./p> </body> </html>
Here is the Python code using clear() method
Example
from bs4 import BeautifulSoup fp = open(''index.html'') soup = BeautifulSoup(fp, ''html.parser'') tags = soup.find_all() for tag in tags: tag.clear() print (soup.prettify())
Output
<html> </html>
The extract() method removes either a tag or a string from the document tree, and returns the object that was removed.
Example
from bs4 import BeautifulSoup fp = open(''index.html'') soup = BeautifulSoup(fp, ''html.parser'') tags = soup.find_all() for tag in tags: obj = tag.extract() print ("Extracted:",obj) print (soup)
Output
Extracted: <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> </html> Extracted: <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> Extracted: <p> The quick, brown fox jumps over a lazy dog.</p> Extracted: <p> DJs flock by when MTV ax quiz prog.</p> Extracted: <p> Junk MTV quiz graced by fox whelps.</p> Extracted: <p> Bawds jog, flick quartz, vex nymphs.</p>
You can extract either a tag or a string. The following example shows antag being extracted.
Example
html = '''''' <ol id="HR"> <li>Rani</li> <li>Ankita</li> </ol> '''''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') obj=soup.find(''ol'') obj.find_next().extract() print (soup)
Output
<ol id="HR"> <li>Ankita</li> </ol>
Change the extract() statement to remove inner text of first <li> element.
Example
obj.find_next().string.extract()
Output
<ol id="HR"> <li>Ankita</li> </ol>
There is another method decompose() that removes a tag from the tree, then completely destroys it and its contents −
Example
html = '''''' <ol id="HR"> <li>Rani</li> <li>Ankita</li> </ol> '''''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag1=soup.find(''ol'') tag2 = soup.find(''li'') tag2.decompose() print (soup) print (tag2.decomposed)
Output
<ol id="HR"> <li>Ankita</li> </ol>
The decomposed property returns True or False – whether an element has been decomposed or not.
Modify the Contents
We shall look at the replace_with() method that allows contents of a tag to be replaced.
Just as a Python string, which is immutable, the NavigableString also can”t be modified in place. However, use replace_with() to replace the inner string of a tag with another.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<h2 id=''message''>Hello, Tutorialspoint!</h2>",''html.parser'') tag = soup.h2 tag.string.replace_with("OnLine Tutorials Library") print (tag.string)
Output
OnLine Tutorials Library
Here is another example to show the use of replace_with(). Two parsed documents can be combined if you pass a BeautifulSoup object as an argument to a certain function such as replace_with().2524
Example
from bs4 import BeautifulSoup obj1 = BeautifulSoup("<book><title>Python</title></book>", features="xml") obj2 = BeautifulSoup("<b>Beautiful Soup parser</b>", "lxml") obj2.find(''b'').replace_with(obj1) print (obj2)
Output
<html><body><book><title>Python</title></book></body></html>
The wrap() method wraps an element in the tag you specify. It returns the new wrapper.
from bs4 import BeautifulSoup soup = BeautifulSoup("<p>Hello Python</p>", ''html.parser'') tag = soup.p newtag = soup.new_tag(''b'') tag.string.wrap(newtag) print (soup)
Output
<p><b>Hello Python</b></p>
On the other hand, the unwrap() method replaces a tag with whatever”s inside that tag. It”s good for stripping out markup.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p>Hello <b>Python</b></p>", ''html.parser'') tag = soup.p tag.b.unwrap() print (soup)
Output
<p>Hello Python</p>
”;