”;
The Beautiful Soup API has three main types of objects. The soup object, the Tag object, and the NavigableString object. Let us find out how we can convert each of these object to string. In Python, string is a str object.
Assuming that we have a following HTML document
html = '''''' <p>Hello <b>World</b></p> ''''''
Let us put this string as argument for BeautifulSoup constructor. The soup object is then typecast to string object with Python”s builtin str() function.
The parsed tree of this HTML string will be constructed dpending upon which parser you use. The built-in html parser doesn”t add the <html> and <body> tags.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') print (str(soup))
Output
<p>Hello <b>World</b></p>
On the other hand, the html5lib parser constructs the tree after inserting the formal tags such as <html> and <body>
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html5lib'') print (str(soup))
Output
<html><head></head><body><p>Hello <b>World</b></p> </body></html>
The Tag object has a string property that returns a NavigableString object.
tag = soup.find(''b'') obj = (tag.string) print (type(obj),obj)
Output
string <class ''bs4.element.NavigableString''> World
There is also a Text property defined for Tag object. It returns the text contained in the tag, stripping off all the inner tags and attributes.
If the HTML string is −
html = '''''' <p>Hello <div id=''id''>World</div></p> ''''''
We try to obtain the text property of <p> tag
tag = soup.find(''p'') obj = (tag.text) print ( type(obj), obj)
Output
<class ''str''> Hello World
You can also use the get_text() method which returns a string representing the text inside the tag. The function is actually a wrapper arounf the text property as it also gets rid of inner tags and attributes, and returns a string
obj = tag.get_text() print (type(obj),obj)
Output
<class ''str''> Hello World
”;