beautiful Soup Archives - Page 5 of 11 - Donotsad where can learn any thing work project and make money

Aug 09

Beautiful Soup – Find all Children of an Element

Beautiful Soup – Find all Children of an Element ”; Previous Next The structure of tags in a HTML script is hierarchical. The elements are nested one inside the other. For example, the top level <HTML> tag includes <HEAD> and <BODY> tags, each may have other tags in it. The top level element is called as parent. The elements nested inside the parent are its children. With the help of Beautiful Soup, we can find all the children elements of a parent element. In this chapter, we shall find out how to obtain the children of a HTML element. There are two provisions in BeautifulSoup class to fetch the children elements. The .children property The findChildren() method Examples in this chapter use the following HTML script (index.html) <html> <head> <title>TutorialsPoint</title> </head> <body> <h2>Departmentwise Employees</h2> <ul id=”dept”> <li>Accounts</li> <ul id=”acc”> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul id=”HR”> <li>Rani</li> <li>Ankita</li> </ul> </ul> </body> </html> Using .children property The .children property of a Tag object returns a generator of all the child elements in a recursive manner. The following Python code gives a list of all the children elements of top level <ul> tag. We first obtain the Tag element corresponding to the <ul> tag, and then read its .children property Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) tag = soup.ul print (list(tag.children)) Output [”n”, <li>Accounts</li>, ”n”, <ul> <li>Anand</li> <li>Mahesh</li> </ul>, ”n”, <li>HR</li>, ”n”, <ul> <li>Rani</li> <li>Ankita</li> </ul>, ”n”] Since the .children property returns a list_iterator, we can use a for loop to traverse the hierarchy. for child in tag.children: print (child) Output <li>Accounts</li> <ul> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul> <li>Rani</li> <li>Ankita</li> </ul> Using findChildren() method The findChildren() method offers a more comprehensive alternative. It returns all the child elements under any top level tag. In the index.html document, we have two nested unordered lists. The top level <ul> element has id = “dept” and the two enclosed lists are having id = “acc” and “HR” respectively. In the following example, we first instantiate a Tag object pointing to top level <ul> element and extract the list of children under it. from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tag = soup.find(“ul”, {“id”: “dept”}) children = tag.findChildren() for child in children: print(child) Note that the resultset includes the children under an element in a recursive fashion. Hence, in the following output, you”ll find the entire inner list, followed by individual elements in it. <li>Accounts</li> <ul id=”acc”> <li>Anand</li> <li>Mahesh</li> </ul> <li>Anand</li> <li>Mahesh</li> <li>HR</li> <ul id=”HR”> <li>Rani</li> <li>Ankita</li> </ul> <li>Rani</li> <li>Ankita</li> Let us extract the children under an inner <ul> element with id=”acc”. Here is the code − Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tag = soup.find(“ul”, {“id”: “acc”}) children = tag.findChildren() for child in children: print(child) When the above program is run, you”ll obtain the <li>elements under the <ul> with id as acc. Output <li>Anand</li> <li>Mahesh</li> Thus, BeautifulSoup makes it very easy to parse the children elements under any top level HTML element. Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Find Element using CSS Selectors

Beautiful Soup – Find Element using CSS Selectors ”; Previous Next In Beautiful Soup library, the select() method is an important tool for scraping the HTML/XML document. Similar to find() and the other find_*() methods, the select() method also helps in locating an element that satisfies a given criteria. However, the find*() methods search for the PageElements according to the Tag name and its attributes, the select() method searches the document tree for the given CSS selector. Beautiful Soup also has select_one() method. Difference in select() and select_one() is that, select() returns a ResultSet of all the elements belonging to the PageElement and characterized by the CSS selector; whereas select_one() returns the first occurrence of the element satisfying the CSS selector based selection criteria. Prior to Beautiful Soup version 4.7, the select() method used to be able to support only the common CSS selectors. With version 4.7, Beautiful Soup was integrated with Soup Sieve CSS selector library. As a result, much more selectors can now be used. In the version 4.12, a .css property has been added in addition to the existing convenience methods, select() and select_one().The parameters for select() method are as follows − select(selector, limit, **kwargs) selector − A string containing a CSS selector. limit − After finding this number of results, stop looking. kwargs − Keyword arguments to be passed. If the limit parameter is set to 1, it becomes equivalent to select_one() method. While the select() method returns a ResultSet of Tag objects, the select_one() method returns a single Tag object. Soup Sieve Library Soup Sieve is a CSS selector library. It has been integrated with Beautiful Soup 4, so it is installed along with Beautiful Soup package. It provides ability to select, match, and filter he document tree tags using modern CSS selectors. Soup Sieve currently implements most of the CSS selectors from the CSS level 1 specifications up to CSS level 4, except for some that are not yet implemented. The Soup Sieve library has different types of CSS selectors. The basic CSS selectors are − Type selector Matching elements is done by node name. For example − tags = soup.select(”div”) Example from bs4 import BeautifulSoup, NavigableString markup = ””” <div id=”Languages”> <p>Java</p> <p>Python</p> <p>C++</p> </div> ””” soup = BeautifulSoup(markup, ”html.parser”) tags = soup.select(”div”) print (tags) Output [<div id=”Languages”> <p>Java</p> <p>Python</p> <p>C++</p> </div>] Universal selector (*) It matches elements of any type. Example − tags = soup.select(”*”) ID selector It matches an element based on its id attribute. The symbol # denotes the ID selector. Example − tags = soup.select(“#nm”) Example from bs4 import BeautifulSoup html = ””” <form> <input type = ”text” id = ”nm” name = ”name”> <input type = ”text” id = ”age” name = ”age”> <input type = ”text” id = ”marks” name = ”marks”> </form> ””” soup = BeautifulSoup(html, ”html.parser”) obj = soup.select(“#nm”) print (obj) Output [<input id=”nm” name=”name” type=”text”/>] Class selector It matches an element based on the values contained in the class attribute. The . symbol prefixed to the class name is the CSS class selector. Example − tags = soup.select(“.submenu”) Example from bs4 import BeautifulSoup, NavigableString markup = ””” <div id=”Languages”> <p>Java</p> <p>Python</p> <p>C++</p> </div> ””” soup = BeautifulSoup(markup, ”html.parser”) tags = soup.select(”div”) print (tags) Output [<div id=”Languages”> <p>Java</p> <p>Python</p> <p>C++</p> </div>] Attribute Selectors The attribute selector matches an element based on its attributes. soup.select(”[attr]”) Example from bs4 import BeautifulSoup html = ””” <h1>Tutorialspoint Online Library</h1> <p><b>It”s all Free</b></p> <a class=”prog” href=”https://www.tutorialspoint.com/java/java_overview.htm” id=”link1″>Java</a> <a class=”prog” href=”https://www.tutorialspoint.com/cprogramming/index.htm” id=”link2″>C</a> ””” soup = BeautifulSoup(html, ”html5lib”) print(soup.select(”[href]”)) Output [<a class=”prog” href=”https://www.tutorialspoint.com/java/java_overview.htm” id=”link1″>Java</a>, <a class=”prog” href=”https://www.tutorialspoint.com/cprogramming/index.htm” id=”link2″>C</a>] Pseudo Classes CSS specification defines a number of pseudo CSS classes. A pseudo-class is a keyword added to a selector so as to define a special state of the selected elements. It adds an effect to the existing elements. For example, :link selects a link (every <a> and <area> element with an href attribute) that has not yet been visited. The pseudo-class selectors nth-of-type and nth-child are very widely used. :nth-of-type() The selector :nth-of-type() matches elements of a given type, based on their position among a group of siblings. The keywords even and odd, and will respectively select elements, from a sub-group of sibling elements. In the following example, second element of <p> type is selected. Example from bs4 import BeautifulSoup html = ””” <p id=”0″></p> <p id=”1″></p> <span id=”2″></span> <span id=”3″></span> ””” soup = BeautifulSoup(html, ”html5lib”) print(soup.select(”p:nth-of-type(2)”)) Output [<p id=”1″></p>] :nth-child() This selector matches elements based on their position in a group of siblings. The keywords even and odd will respectively select elements whose position is either even or odd amongst a group of siblings. Usage :nth-child(even) :nth-child(odd) :nth-child(2) Example from bs4 import BeautifulSoup, NavigableString markup = ””” <div id=”Languages”> <p>Java</p> <p>Python</p> <p>C++</p> </div> ””” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.div child = tag.select_one(”:nth-child(2)”) print (child) Output <p>Python</p> Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Find all Comments

Beautiful Soup – Find all Comments ”; Previous Next Inserting comments in a computer code is supposed to be a good programming practice. Comments are helpful for understanding the logic of the program. They also serve as a documentation. You can put comments in a HTML as well as XML script, just as in a program written in C, Java, Python etc. BeautifulSoup API can be helpful to identify all the comments in a HTML document. In HTML and XML, the comment text is written between <!– and –> tags. <!– Comment Text –> The BeutifulSoup package, whose internal name is bs4, defines Comment as an important object. The Comment object is a special type of NavigableString object. Hence, the string property of any Tag that is found between <!– and –> is recognized as a Comment. Example from bs4 import BeautifulSoup markup = “<b><!–This is a comment text in HTML–></b>” soup = BeautifulSoup(markup, ”html.parser”) comment = soup.b.string print (comment, type(comment)) Output This is a comment text in HTML <class ”bs4.element.Comment”> To search for all the occurrences of comment in a HTML document, we shall use find_all() method. Without any argument, find_all() returns all the elements in the parsed HTML document. You can pass a keyword argument ”string” to find_all() method. We shall assign the return value of a function iscomment() to it. comments = soup.find_all(string=iscomment) The iscomment() function verifies if the text in a tag is a comment object or not, with the help of isinstance() function. def iscomment(elem): return isinstance(elem, Comment) The comments variable shall store all the comment text occurrences in the given HTML document. We shall use the following index.html file in the example code − <html> <head> <!– Title of document –> <title>TutorialsPoint</title> </head> <body> <!– Page heading –> <h2>Departmentwise Employees</h2> <!– top level list–> <ul id=”dept”> <li>Accounts</li> <ul id=”acc”> <!– first inner list –> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul id=”HR”> <!– second inner list –> <li>Rani</li> <li>Ankita</li> </ul> </ul> </body> </html> The following Python program scrapes the above HTML document, and finds all the comments in it. Example from bs4 import BeautifulSoup, Comment fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) def iscomment(elem): return isinstance(elem, Comment) comments = soup.find_all(string=iscomment) print (comments) Output [” Title of document ”, ” Page heading ”, ” top level list”, ” first inner list ”, ” second inner list ”] The above output shows a list of all comments. We can also use a for loop over the collection of comments. Example i=0 for comment in comments: i+=1 print (i,”.”,comment) Output 1 . Title of document 2 . Page heading 3 . top level list 4 . first inner list 5 . second inner list In this chapter, we learned how to extract all the comment strings in a HTML document. Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Find Elements by ID

Beautiful Soup – Find Elements by ID ”; Previous Next In an HTML document, usually each element is assigned a unique ID. This enables the value of an element to be extracted by a front-end code such as JavaScript function. With BeautifulSoup, you can find the contents of a given element by its ID. There are two methods by which this can be achieved – find() as well as find_all(), and select() Using find() method The find() method of BeautifulSoup object searches for first element that satisfies the given criteria as an argument. Let us use the following HTML script (as index.html) for the purpose <html> <head> <title>TutorialsPoint</title> </head> <body> <form> <input type = ”text” id = ”nm” name = ”name”> <input type = ”text” id = ”age” name = ”age”> <input type = ”text” id = ”marks” name = ”marks”> </form> </body> </html> The following Python code finds the element with its id as nm Example from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) obj = soup.find(id = ”nm”) print (obj) Output <input id=”nm” name=”name” type=”text”/> Using find_all() The find_all() method also accepts a filter argument. It returns a list of all the elements with the given id. In a certain HTML document, usually a single element with a particular id. Hence, using find() instead of find_all() is preferrable to search for a given id. Example from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) obj = soup.find_all(id = ”nm”) print (obj) Output [<input id=”nm” name=”name” type=”text”/>] Note that the find_all() method returns a list. The find_all() method also has a limit parameter. Setting limit=1 to find_all() is equivalent to find() obj = soup.find_all(id = ”nm”, limit=1) Using select() method The select() method in BeautifulSoup class accepts CSS selector as an argument. The # symbol is the CSS selector for id. It followed by the value of required id is passed to select() method. It works as the find_all() method. Example from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) obj = soup.select(“#nm”) print (obj) Output [<input id=”nm” name=”name” type=”text”/>] Using select_one() Like the find_all() method, the select() method also returns a list. There is also a select_one() method to return the first tag of the given argument. Example from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) obj = soup.select_one(“#nm”) print (obj) Output <input id=”nm” name=”name” type=”text”/> Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Scraping Paragraphs from HTML

Beautiful Soup – Scraping Paragraphs from HTML ”; Previous Next One of the frequently appearing tags in a HTML document is the <p> tag that marks a paragraph text. With Beautiful Soup, you can easily extract paragraph from the parsed document tree. In this chapter, we shall discuss the following ways of scraping paragraphs with the help of BeautifulSoup library. Scraping HTML paragraph with <p> tag Scraping HTML paragraph with find_all() method Scraping HTML paragraph with select() method We shall use the following HTML document for these exercises − <html> <head> <title>BeautifulSoup – Scraping Paragraph</title> </head> <body> <p id=”para1”>The quick, brown fox jumps over a lazy dog.</p> <h2>Hello</h2> <p>DJs flock by when MTV ax quiz prog.</p> <p>Junk MTV quiz graced by fox whelps.</p> <p>Bawds jog, flick quartz, vex nymphs.</p> </body> </html> Scraping by <p> tag Easiest way to search a parse tree is to search the tag by its name. Hence, the expression soup.p points towards the first <p> tag in the scouped document. para = soup.p To fetch all the subsequent <p> tags, you can run a loop till the soup object is exhausted of all the <p> tags. The following program displays the prettified output of all the paragraph tags. Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) para = soup.p print (para.prettify()) while True: p = para.find_next(”p”) if p is None: break print (p.prettify()) para=p Output <p> The quick, brown fox jumps over a lazy dog. </p> <p> DJs flock by when MTV ax quiz prog. </p> <p> Junk MTV quiz graced by fox whelps. </p> <p> Bawds jog, flick quartz, vex nymphs. </p> Using find_all() method The find_all() methods is more comprehensive. You can pass various types of filters such as tag, attributes or string etc. to this method. In this case, we want to fetch the contents of a <p> tag. In the following code, find_all() method returns a list of all elements in the <p> tag. Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) paras = soup.find_all(”p”) for para in paras: print (para.prettify()) Output <p> The quick, brown fox jumps over a lazy dog. </p> <p> DJs flock by when MTV ax quiz prog. </p> <p> Junk MTV quiz graced by fox whelps. </p> <p> Bawds jog, flick quartz, vex nymphs. </p> We can use another approach to find all <p> tags. To begin with, obtain list of all tags using find_all() and check Tag.name of each equals =”p”. Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tags = soup.find_all() paras = [tag.contents for tag in tags if tag.name==”p”] print (paras) The find_all() method also has attrs parameter. It is useful when you want to extract the <p> tag with specific attributes. For example, in the given document, the first <p> element has id=”para1”. To fetch it, we need to modify the tag object as − paras = soup.find_all(”p”, attrs={”id”:”para1”}) Using select() method The select() method is essentially used to obtain data using CSS selector. However, you can also pass a tag to it. Here, we can pass the <p> tag to select() method. The select_one() method is also available. It fetches the first occurrence of the <p> tag. Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) paras = soup.select(”p”) print (paras) Output [ <p>The quick, brown fox jumps over a lazy dog.</p>, <p>DJs flock by when MTV ax quiz prog.</p>, <p>Junk MTV quiz graced by fox whelps.</p>, <p>Bawds jog, flick quartz, vex nymphs.</p> ] To filter out <p> tags with a certain id, use a for loop as follows − Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tags = soup.select(”p”) for tag in tags: if tag.has_attr(”id”) and tag[”id”]==”para1”: print (tag.contents) Output [”The quick, brown fox jumps over a lazy dog.”] Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Get all HTML Tags

Beautiful Soup – Get all HTML Tags ”; Previous Next Tags in HTML are like keywords in a traditional programming language like Python or Java. Tags have a predefined behaviour according to which the its content is rendered by the browser. With Beautiful Soup, it is possible to collect all the tags in a given HTML document. The simplest way to obtain a list of tags is to parse the web page into a soup object, and call find_all() methods without any argument. It returns a list generator, giving us a list of all the tags. Let us extract the list of all tags in Google”s homepage. Example from bs4 import BeautifulSoup import requests url = “https://www.google.com/” req = requests.get(url) soup = BeautifulSoup(req.content, “html.parser”) tags = soup.find_all() print ([tag.name for tag in tags]) Output [”html”, ”head”, ”meta”, ”meta”, ”title”, ”script”, ”style”, ”style”, ”script”, ”body”, ”script”, ”div”, ”div”, ”nobr”, ”b”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”u”, ”div”, ”nobr”, ”span”, ”span”, ”span”, ”a”, ”a”, ”a”, ”div”, ”div”, ”center”, ”br”, ”div”, ”img”, ”br”, ”br”, ”form”, ”table”, ”tr”, ”td”, ”td”, ”input”, ”input”, ”input”, ”input”, ”input”, ”div”, ”input”, ”br”, ”span”, ”span”, ”input”, ”span”, ”span”, ”input”, ”script”, ”input”, ”td”, ”a”, ”input”, ”script”, ”div”, ”div”, ”br”, ”div”, ”style”, ”div”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”a”, ”span”, ”div”, ”div”, ”a”, ”a”, ”a”, ”a”, ”p”, ”a”, ”a”, ”script”, ”script”, ”script”] Naturally, you may get such a list where one certain tag may appear more than once. To obtain a list of unique tags (avoiding the duplication), construct a set from the list of tag objects. Change the print statement in above code to Example print ({tag.name for tag in tags}) Output {”body”, ”head”, ”p”, ”a”, ”meta”, ”tr”, ”nobr”, ”script”, ”br”, ”img”, ”b”, ”form”, ”center”, ”span”, ”div”, ”input”, ”u”, ”title”, ”style”, ”td”, ”table”, ”html”} To obtain tags with some text associated with them, check the string property and print if it is not None tags = soup.find_all() for tag in tags: if tag.string is not None: print (tag.name, tag.string) There may be some singleton tags without text but with one or more attributes as in the <img> tag. Following loop constructs lists out such tags. In the following code, the HTML string is not a complete HTML document in the sense that thr <html> and <body> tags are not given. But the html5lib and lxml parsers add these tags on their own while parsing the document tree. Hence, when we extract the tag list, the additional tags will also be seen. Example html = ””” <h1 style=”color:blue;text-align:center;”>This is a heading</h1> <p style=”color:red;”>This is a paragraph.</p> <p>This is another paragraph</p> ””” from bs4 import BeautifulSoup soup = BeautifulSoup(html, “html5lib”) tags = soup.find_all() print ({tag.name for tag in tags} ) Output {”head”, ”html”, ”p”, ”h1”, ”body”} Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – find_next_sibling Method

Beautiful Soup – find_next_sibling() Method ”; Previous Next Method Description The find_next_sibling() method in Beautiful Soup Find the closest sibling at the same level to this PageElement that matches the given criteria and appears later in the document. This method is similar to next_sibling property. Syntax find_fnext_sibling(name, attrs, string, **kwargs) Parameters name − A filter on tag name. attrs − A dictionary of filters on attribute values. string − The string to search for (rather than tag). kwargs − A dictionary of filters on attribute values. Return Type The find_next_sibling() method returns Tag object or a NavigableString object. Example 1 from bs4 import BeautifulSoup soup = BeautifulSoup(“<p><b>Hello</b><i>Python</i></p>”, ”html.parser”) tag1 = soup.find(”b”) print (“next:”,tag1.find_next_sibling()) Output next: <i>Python</i> Example 2 If the next node doesn”t exist, the method returns None. from bs4 import BeautifulSoup soup = BeautifulSoup(“<p><b>Hello</b><i>Python</i></p>”, ”html.parser”) tag1 = soup.find(”i”) print (“next:”,tag1.find_next_sibling()) Output next: None Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – find_all_next Method

Beautiful Soup – find_all_next() Method ”; Previous Next Method Description The find_all_next() method in Beautiful Soup finds all PageElements that match the given criteria and appear after this element in the document. This method returns tags or NavigableString objects and method takes in the exact same parameters as find_all(). Syntax find_all_next(name, attrs, string, limit, **kwargs) Parameters name − A filter on tag name. attrs − A dictionary of filters on attribute values. recursive − If this is True, find() a recursive search will be performed. Otherwise, only the direct children will be considered. limit − Stop looking after specified number of occurrences have been found. kwargs − A dictionary of filters on attribute values. Return Value This method returns a ResultSet containing PageElements (Tags or NavigableString objects). Example 1 Using the index.html as the HTML document for this example, we first locate the <form> tag and collect all the elements after it with find_all_next() method. from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) tag = soup.form tags = tag.find_all_next() print (tags) Output [<input id=”nm” name=”name” type=”text”/>, <input id=”age” name=”age” type=”text”/>, <input id=”marks” name=”marks” type=”text”/>] Example 2 Here, we apply a filter to the find_all_next() method to collect all the tags subsequent to <form>, with id being nm or age. from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) tag = soup.form tags = tag.find_all_next(id=[”nm”, ”age”]) print (tags) Output [<input id=”nm” name=”name” type=”text”/>, <input id=”age” name=”age” type=”text”/>] Example 3 If we check the tags following the body tag, it includes a <h1> tag as well as <form> tag, that includes three input elements. from bs4 import BeautifulSoup fp = open(“index.html”) soup = BeautifulSoup(fp, ”html.parser”) tag = soup.body tags = tag.find_all_next() print (tags) Output <h1>TutorialsPoint</h1> <form> <input id=”nm” name=”name” type=”text”/> <input id=”age” name=”age” type=”text”/> <input id=”marks” name=”marks” type=”text”/> </form> <input id=”nm” name=”name” type=”text”/> <input id=”age” name=”age” type=”text”/> <input id=”marks” name=”marks” type=”text”/> Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – smooth Method

Beautiful Soup – smooth() Method ”; Previous Next Method Description After calling a bunch of methods that modify the parse tree, you may end up with two or more NavigableString objects next to each other. The smooth() method smooths out this element”s children by consolidating consecutive strings. This makes pretty-printed output look more natural following a lot of operations that modified the tree. Syntax smooth() Parameters This method has no parameters. Return Type This method returns the given tag after smoothing. Example 1 html =”””<html> <head> <title>TutorislsPoint/title> </head> <body> Some Text <div></div> <p></p> <div>Some more text</div> <b></b> <i></i> # COMMENT </body> </html>””” from bs4 import BeautifulSoup soup = BeautifulSoup(html, “html.parser”) soup.find(”body”).sm for item in soup.find_all(): if not item.get_text(strip=True): p = item.parent item.replace_with(””) p.smooth() print (soup.prettify()) Output <html> <head> <title> TutorislsPoint/title> </title> </head> <body> Some Text <div> Some more text </div> # COMMENT </body> </html> Example 2 from bs4 import BeautifulSoup soup = BeautifulSoup(“<p>Hello</p>”, ”html.parser”) soup.p.append(“, World”) soup.smooth() print (soup.p.contents) print(soup.p.prettify()) Output [”Hello, World”] <p> Hello, World </p> Print Page Previous Next Advertisements ”;

Aug 09

Beautiful Soup – Useful Resources

Beautiful Soup – Useful Resources ”; Previous Next The following resources contain additional information on Beautiful Soup. Please use them to get more in-depth knowledge on this. Useful Video Courses Learn Python 3 – Online course Best Seller 79 Lectures 17.5 hours Joseph Delgadillo More Detail The Complete Python 3 Course: From Beginner to Advanced Best Seller 147 Lectures 18 hours Joseph Delgadillo More Detail Web Scraping using API, Beautiful Soup using Python 39 Lectures 3.5 hours Chandramouli Jayendran More Detail A-Z Python Bootcamp- Basics To Data Science (50+ Hours) Best Seller 436 Lectures 46 hours Chandramouli Jayendran More Detail Beautiful Soup in Action – Web Scraping a Car Dealer Website 7 Lectures 1 hours AlexanderSchlee More Detail Data Project with Beautiful Soup – Web Scraping E-Commerce 7 Lectures 1 hours AlexanderSchlee More Detail Print Page Previous Next Advertisements ”;