Beautiful Soup – Navigating by Tags ”; Previous Next One of the important pieces of element in any piece of HTML document are tags, which may contain other tags/strings (tag”s children). Beautiful Soup provides different ways to navigate and iterate over”s tag”s children. Easiest way to search a parse tree is to search the tag by its name. soup.head The soup.head function returns the contents put inside the <head> .. </head> element of a HTML page. Consider the following HTML page to be scraped: <html> <head> <title>TutorialsPoint</title> <script> document.write(“Welcome to TutorialsPoint”); </script> </head> <body> <h1>Tutorialspoint Online Library</h1> <p><b>It”s all Free</b></p> </body> </html> Following code extracts the contents of <head> element Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) print(soup.head) Output <head> <title>TutorialsPoint</title> <script> document.write(“Welcome to TutorialsPoint”); </script> </head> soup.body Similarly, to return the contents of body part of HTML page, use soup.body Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) print (soup.body) Output <body> <h1>Tutorialspoint Online Library</h1> <p><b>It”s all Free</b></p> </body> You can also extract specific tag (like first <h1> tag) in the <body> tag. Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) print(soup.body.h1) Output <h1>Tutorialspoint Online Library</h1> soup.p Our HTML file contains a <p> tag. We can extract the contents of this tag Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) print(soup.p) Output <p><b>It”s all Free</b></p> Tag.contents A Tag object may have one or more PageElements. The Tag object”s contents property returns a list of all elements included in it. Let us find the elements in <head> tag of our index.html file. Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) tag = soup.head print (tag.contents) Output [”n”, <title>TutorialsPoint</title>, ”n”, <script> document.write(“Welcome to TutorialsPoint”); </script>, ”n”] Tag.children The structure of tags in a HTML script is hierarchical. The elements are nested one inside the other. For example, the top level <HTML> tag includes <HEAD> and <BODY> tags, each may have other tags in it. The Tag object has a children property that returns a list iterator object containing the enclosed PageElements. To demonstrate the children property, we shall use the following HTML script (index.html). In the <body> section, there are two <ul> list elements, one nested in another. In other words, the body tag has top level list elements, and each list element has another list under it. <html> <head> <title>TutorialsPoint</title> </head> <body> <h2>Departmentwise Employees</h2> <ul> <li>Accounts</li> <ul> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul> <li>Rani</li> <li>Ankita</li> </ul> </ul> </body> </html> The following Python code gives a list of all the children elements of top level <ul> tag. Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) tag = soup.ul print (list(tag.children)) Output [”n”, <li>Accounts</li>, ”n”, <ul> <li>Anand</li> <li>Mahesh</li> </ul>, ”n”, <li>HR</li>, ”n”, <ul> <li>Rani</li> <li>Ankita</li> </ul>, ”n”] Since the .children property returns a list_iterator, we can use a for loop to traverse the hierarchy. Example for child in tag.children: print (child) Output <li>Accounts</li> <ul> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul> <li>Rani</li> <li>Ankita</li> </ul> Tag.find_all() This method returns a result set of contents of all the tags matching with the argument tag provided. Let us consider the following HTML page(index.html) for this − <html> <body> <h1>Tutorialspoint Online Library</h1> <p><b>It”s all Free</b></p> <a class=”prog” href=”https://www.tutorialspoint.com/java/java_overview.htm” id=”link1″>Java</a> <a class=”prog” href=”https://www.tutorialspoint.com/cprogramming/index.htm” id=”link2″>C</a> <a class=”prog” href=”https://www.tutorialspoint.com/python/index.htm” id=”link3″>Python</a> <a class=”prog” href=”https://www.tutorialspoint.com/javascript/javascript_overview.htm” id=”link4″>JavaScript</a> <a class=”prog” href=”https://www.tutorialspoint.com/ruby/index.htm” id=”link5″>C</a> </body> </html> The following code lists all the elements with <a> tag Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) result = soup.find_all(“a”) print (result) Output [ <a class=”prog” href=”https://www.tutorialspoint.com/java/java_overview.htm” id=”link1″>Java</a>, <a class=”prog” href=”https://www.tutorialspoint.com/cprogramming/index.htm” id=”link2″>C</a>, <a class=”prog” href=”https://www.tutorialspoint.com/python/index.htm” id=”link3″>Python</a>, <a class=”prog” href=”https://www.tutorialspoint.com/javascript/javascript_overview.htm” id=”link4″>JavaScript</a>, <a class=”prog” href=”https://www.tutorialspoint.com/ruby/index.htm” id=”link5″>C</a> ] Print Page Previous Next Advertisements ”;