Table of Contents

Beautiful Soup – Output Formatting

”;

If the HTML string given to BeautifulSoup constructor contains any of the HTML entities, they will be converted to Unicode characters.

An HTML entity is a string that begins with an ampersand ( & ) and ends with a semicolon ( ; ). They are used to display reserved characters (which would otherwise be interpreted as HTML code). Some of the examples of HTML entities are −

<	less than	<	<
>	greater than	>	>
&	ampersand	&	&
“	double quote	"	"
”	single quote	'	'
“	Left Double quote	“	“
“	Right double quote	”	”
£	Pound	£	£
¥	yen	¥	¥
€	euro	€	€
©	copyright	©	©

By default, the only characters that are escaped upon output are bare ampersands and angle brackets. These get turned into “&”, “<”, and “>”

For others, they”ll be converted to Unicode characters.

Example


from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", ''html.parser'')
print (str(soup))

Output


Hello "World!"

If you then convert the document to a bytestring, the Unicode characters will be encoded as UTF-8. You won”t get the HTML entities back −

Example


from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", ''html.parser'')
print (soup.encode())

Output


b''Hello xe2x80x9cWorld!xe2x80x9d''

To change this behavior provide a value for the formatter argument to prettify() method. There are following possible values for the formatter.

formatter=”minimal” − This is the default. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML

formatter=”html” − Beautiful Soup will convert Unicode characters to HTML entities whenever possible.

formatter=”html5″ − it”s similar to formatter=”html”, but Beautiful Soup will omit the closing slash in HTML void tags like “br”

formatter=None − Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML

Example


from bs4 import BeautifulSoup

french = "<p>Il a dit <<Sacré bleu!>></p>"
soup = BeautifulSoup(french, ''html.parser'')
print ("minimal: ")
print(soup.prettify(formatter="minimal"))
print ("html: ")
print(soup.prettify(formatter="html"))
print ("None: ")
print(soup.prettify(formatter=None))

Output


minimal: 
<p>
 Il a dit <<Sacré bleu!>>
</p>

html:
<p>
 Il a dit <<Sacré bleu!>>
</p>

None:
<p>
 Il a dit <<Sacré bleu!>>
</p>

In addition, Beautiful Soup library provides formatter classes. You can pass an object of any of these classes as argument to prettify() method.

HTMLFormatter class − Used to customize the formatting rules for HTML documents.

XMLFormatter class − Used to customize the formatting rules for XML documents.

Print Page

Beautiful Soup – Output Formatting

Example

Output

Example

Output

Example

Output

Leave a Reply Cancel reply