Beautiful Soup – find vs find_all


Beautiful Soup – find vs find_all



”;


Beautiful Soup library includes find() as well as find_all() methods. Both methods are one of the most frequently used methods while parsing HTML or XML documents. From a particular document tree You often need to locate a PageElement of a certain tag type, or having certain attributes, or having a certain CSS style etc. These criteria are given as argument to both find() and find_all() methods. The main point of difference between the two is that while find() locates the very first child element that satisfies the criteria, find_all() method searches for all the children elements of the criteria.

The find() method is defined with following syntax −

Syntax


find(name, attrs, recursive, string, **kwargs)

The name argument specifies a filter on tag name. With attrs, a filter on tag attribute values can be set up. The recursive argument forces a recursive search if it is True. You can pass variable kwargs as dictionary of filters on attribute values.


soup.find(id = ''nm'')
soup.find(attrs={"name":''marks''})

The find_all() method takes all the arguments as for the find() method, in addition there is a limit argument. It is an integer, restricting the search the specified number of occurrences of the given filter criteria. If not set, find_all() searches for the criteria among all the children under the said PageElement.


soup.find_all(''input'')
lst=soup.find_all(''li'', limit =2)

If the limit argument for find_all() method is set to 1, it virtually acts as find() method.

The return type of both the methods differs. The find() method returns either a Tag object or a NavigableString object first found. The find_all() method returns a ResultSet consisting of all the PageElements satisfying the filter criteria.

Here is an example that demonstrates the difference between find and find_all methods.

Example


from bs4 import BeautifulSoup

markup =open("index.html")

soup = BeautifulSoup(markup, ''html.parser'')
ret1 = soup.find(''input'')
ret2 = soup.find_all (''input'')
print (ret1, ''Return type of find:'', type(ret1))
print (ret2)
print (''Return tyoe find_all:'', type(ret2))

#set limit =1
ret3 = soup.find_all (''input'', limit=1)
print (''find:'', ret1)
print (''find_all:'', ret3)

Output


<input id="nm" name="name" type="text"/> Return type of find: <class ''bs4.element.Tag''>
[<input id="nm" name="name" type="text"/>, <input id="age" name="age" type="text"/>, <input id="marks" name="marks" type="text"/>]
Return tyoe find_all: <class ''bs4.element.ResultSet''>
find: <input id="nm" name="name" type="text"/>
find_all: [<input id="nm" name="name" type="text"/>]

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *