”;
It is time to test our Beautiful Soup package in one of the html pages (taking web page – https://www.tutorialspoint.com/index.htm, you can choose any-other web page you want) and extract some information from it.
In the below code, we are trying to extract the title from the webpage −
Example
from bs4 import BeautifulSoup import requests url = "https://www.tutorialspoint.com/index.htm" req = requests.get(url) soup = BeautifulSoup(req.content, "html.parser") print(soup.title)
Output
<title>Online Courses and eBooks Library<title>
One common task is to extract all the URLs within a webpage. For that we just need to add the below line of code −
for link in soup.find_all(''a''): print(link.get(''href''))
Output
Shown below is the partial output of the above loop −
https://www.tutorialspoint.com/index.htm https://www.tutorialspoint.com/codingground.htm https://www.tutorialspoint.com/about/about_careers.htm https://www.tutorialspoint.com/whiteboard.htm https://www.tutorialspoint.com/online_dev_tools.htm https://www.tutorialspoint.com/business/index.asp https://www.tutorialspoint.com/market/teach_with_us.jsp https://www.facebook.com/tutorialspointindia https://www.instagram.com/tutorialspoint_/ https://twitter.com/tutorialspoint https://www.youtube.com/channel/UCVLbzhxVTiTLiVKeGV7WEBg https://www.tutorialspoint.com/categories/development https://www.tutorialspoint.com/categories/it_and_software https://www.tutorialspoint.com/categories/data_science_and_ai_ml https://www.tutorialspoint.com/categories/cyber_security https://www.tutorialspoint.com/categories/marketing https://www.tutorialspoint.com/categories/office_productivity https://www.tutorialspoint.com/categories/business https://www.tutorialspoint.com/categories/lifestyle https://www.tutorialspoint.com/latest/prime-packs https://www.tutorialspoint.com/market/index.asp https://www.tutorialspoint.com/latest/ebooks … …
To parse a web page stored locally in the current working directory, obtain the file object pointing to the html file, and use it as argument to the BeautifulSoup() constructor.
Example
from bs4 import BeautifulSoup with open("index.html") as fp: soup = BeautifulSoup(fp, ''html.parser'') print(soup)
Output
<html> <head> <title>Hello World</title> </head> <body> <h1 style="text-align:center;">Hello World</h1> </body> </html>
You can also use a string that contains HTML script as constructor”s argument as follows −
from bs4 import BeautifulSoup html = '''''' <html> <head> <title>Hello World</title> </head> <body> <h1 style="text-align:center;">Hello World</h1> </body> </html> '''''' soup = BeautifulSoup(html, ''html.parser'') print(soup)
Beautiful Soup uses the best available parser to parse the document. It will use an HTML parser unless specified otherwise.
”;