”;
In this chapter, we shall discuss different methods in Beautiful Soup for navigating the HTML document tree in different directions – going up and down, sideways, and back and forth.
We shall use the following HTML string in all the examples in this chapter −
html = """ <html><head><title>TutorialsPoint</title></head> <body> <p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a href="https://tutorialspoint.com/Python" class="lang" id="link1">Python</a>, <a href="https://tutorialspoint.com/Java" class="lang" id="link2">Java</a> and <a href="https://tutorialspoint.com/PHP" class="lang" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p> """
The name of required tag lets you navigate the parse tree. For example soup.head fetches you the <head> element −
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') print (soup.head.prettify())
Output
<head> <title> TutorialsPoint </title> </head>
Going down
A tag may contain strings or other tags enclosed in it. The .contents property of Tag object returns a list of all the children elements belonging to it.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.head print (list(tag.children))
Output
[<title>TutorialsPoint</title>]
The returned object is a list, although in this case, there is only a single child tag enclosed in head element.
.children
The .children property also returns a list of all the enclosed elements in a tag. Below, all the elements in body tag are given as a list.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.body print (list(tag.children))
Output
[''n'', <p class="title"><b>Online Tutorials Library</b></p>, ''n'', <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a class="lang" href="https://tutorialspoint.com/Python" id="link1">Python</a>, <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a>; Enhance your Programming skills.</p>, ''n'', <p class="tutorial">...</p>, ''n'']
Instead of getting them as a list, you can iterate over a tag”s children using the .children generator −
Example
tag = soup.body for child in tag.children: print (child)
Output
<p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a class="lang" href="https://tutorialspoint.com/Python" id="link1">Python</a>, <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p>
.descendents
The .contents and .children attributes only consider a tag”s direct children. The .descendants attribute lets you iterate over all of a tag”s children, recursively: its direct children, the children of its direct children, and so on.
The BeautifulSoup object is at the top of hierarchy of all the tags. Hence its .descendents property includes all the elements in the HTML string.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') print (soup.descendants)
The .descendents attribute returns a generator, which can be iterated with a for loop. Here, we list out the descendents of the head tag.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.head for element in tag.descendants: print (element)
Output
<title>TutorialsPoint</title> TutorialsPoint
The head tag contains a title tag, which in turn encloses a NavigableString object TutorialsPoint. The <head> tag has only one child, but it has two descendants: the <title> tag and the <title> tag”s child. But the BeautifulSoup object only has one direct child (the <html> tag), but it has many descendants.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tags = list(soup.descendants) print (len(tags))
Output
27
Going Up
Just as you navigate the downstream of a document with children and descendents properties, BeautifulSoup offers .parent and .parent properties to navigate the upstream of a tag
.parent
every tag and every string has a parent tag that contains it. You can access an element”s parent with the parent attribute. In our example, the <head> tag is the parent of the <title> tag.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.title print (tag.parent)
Output
<head><title>TutorialsPoint</title></head>
Since the title tag contains a string (NavigableString), the parent for the string is title tag itself.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.title string = tag.string print (string.parent)
Output
<title>TutorialsPoint</title>
.parents
You can iterate over all of an element”s parents with .parents. This example uses .parents to travel from an <a> tag buried deep within the document, to the very top of the document. In the following code, we track the parents of the first <a> tag in the example HTML string.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.a print (tag.string) for parent in tag.parents: print (parent.name)
Output
Python p body html [document]
Sideways
The HTML tags appearing at the same indentation level are called siblings. Consider the following HTML snippet
<p> <b> Hello </b> <i> Python </i> </p>
In the outer <p> tag, we have <b> and <i> tags at the same indent level, hence they are called siblings. BeautifulSoup makes it possible to navigate between the tags at same level.
.next_sibling and .previous_sibling
These attributes respectively return the next tag at the same level, and the previous tag at same level.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Hello</b><i>Python</i></p>", ''html.parser'') tag1 = soup.b print ("next:",tag1.next_sibling) tag2 = soup.i print ("previous:",tag2.previous_sibling)
Output
next: <i>Python</i> previous: <b>Hello</b>
Since the <b> tag doesn”t have a sibling to its left, and <i> tag doesn”t have a sibling to its right, it returns Nobe in both cases.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Hello</b><i>Python</i></p>", ''html.parser'') tag1 = soup.b print ("next:",tag1.previous_sibling) tag2 = soup.i print ("previous:",tag2.next_sibling)
Output
next: None previous: None
.next_siblings and .previous_siblings
If there are two or more siblings to the right or left of a tag, they can be navigated with the help of the .next_siblings and .previous_siblings attributes respectively. Both of them return generator object so that a for loop can be used to iterate.
Let us use the following HTML snippet for this purpose −
<p> <b> Excellent </b> <i> Python </i> <u> Tutorial </u> </p>
Use the following code to traverse next and previous sibling tags.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Excellent</b><i>Python</i><u>Tutorial</u></p>", ''html.parser'') tag1 = soup.b print ("next siblings:") for tag in tag1.next_siblings: print (tag) print ("previous siblings:") tag2 = soup.u for tag in tag2.previous_siblings: print (tag)
Output
next siblings: <i>Python</i> <u>Tutorial</u> previous siblings: <i>Python</i> <b>Excellent</b>
Back and forth
In Beautiful Soup, the next_element property returns the next string or tag in the parse tree. On the other hand, the previous_element property returns the previous string or tag in the parse tree. Sometimes, the return value of next_element and previous_element attributes is similar to next_sibling and previous_sibling properties.
.next_element and .previous_element
Example
html = """ <html><head><title>TutorialsPoint</title></head> <body> <p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a href="https://tutorialspoint.com/Python" class="lang" id="link1">Python</a>, <a href="https://tutorialspoint.com/Java" class="lang" id="link2">Java</a> and <a href="https://tutorialspoint.com/PHP" class="lang" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p> """ from bs4 import BeautifulSoup soup = BeautifulSoup(html, ''html.parser'') tag = soup.find("a", id="link3") print (tag.next_element) tag = soup.find("a", id="link1") print (tag.previous_element)
Output
PHP TutorialsPoint has an excellent collection of tutorials on:
The next_element after <a> tag with id = “link3” is the string PHP. Similarly, the previous_element returns the string before <a> tag with id = “link1”.
.next_elements and .previous_elements
These attributes of the Tag object return generator respectively of all tags and strings after and before it.
Next elements example
tag = soup.find("a", id="link1") for element in tag.next_elements: print (element)
Output
Python , <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> Java and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a> PHP ; Enhance your Programming skills. <p class="tutorial">...</p> ...
Previous elements example
tag = soup.find("body") for element in tag.previous_elements: print (element)
Output
<html><head><title>TutorialsPoint</title></head>
”;