”;
The arrangement of tags or elements in a HTML document is hierarchical nature. The tags are nested upto multiple levels. For example, the <head> and <body> tags are nested inside <html> tag. Similarly, one or more <li> tags may be inside a <ul> tag. In this chapter, we shall find out how to scrape a tag that has one or more children tags nested in it.
Let us consider the following HTML document −
<div id="outer"> <div id="inner"> <p>Hello<b>World</b></p> <img src=''logo.jpg''> </div> </div>
In this case, the two <div> tags and a <p> tag has one or more child elements nested inside. Whereas, the <img> and <b> tag donot have any children tags.
The findChildren() method returns a ResultSet of all the children under a tag. So, if a tag doesn”t have any children, the ResultSet will be an empty list like [].
Taking this as a cue, the following code finds out the tags under each tag in the document tree and displays the list.
Example
html = """ <div id="outer"> <div id="inner"> <p>Hello<b>World</b></p> <img src=''logo.jpg''> </div> </div> """ from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') for tag in soup.find_all(): print ("Tag: {} attributes: {}".format(tag.name, tag.attrs)) print ("Child tags: ", tag.findChildren()) print()
Output
Tag: div attributes: {''id'': ''outer''} Child tags: [<div id="inner"> <p>Hello<b>World</b></p> <img src="logo.jpg"/> </div>, <p>Hello<b>World</b></p>, <b>World</b>, <img src="logo.jpg"/>] Tag: div attributes: {''id'': ''inner''} Child tags: [<p>Hello<b>World</b></p>, <b>World</b>, <img src="logo.jpg"/>] Tag: p attributes: {} Child tags: [<b>World</b>] Tag: b attributes: {} Child tags: [] Tag: img attributes: {''src'': ''logo.jpg''} Child tags: []
”;