Beautiful Soup – Souping the Page ”; Previous Next It is time to test our Beautiful Soup package in one of the html pages (taking web page – https://www.tutorialspoint.com/index.htm, you can choose any-other web page you want) and extract some information from it. In the below code, we are trying to extract the title from the webpage − Example from bs4 import BeautifulSoup import requests url = “https://www.tutorialspoint.com/index.htm” req = requests.get(url) soup = BeautifulSoup(req.content, “html.parser”) print(soup.title) Output <title>Online Courses and eBooks Library<title> One common task is to extract all the URLs within a webpage. For that we just need to add the below line of code − for link in soup.find_all(”a”): print(link.get(”href”)) Output Shown below is the partial output of the above loop − https://www.tutorialspoint.com/index.htm https://www.tutorialspoint.com/codingground.htm https://www.tutorialspoint.com/about/about_careers.htm https://www.tutorialspoint.com/whiteboard.htm https://www.tutorialspoint.com/online_dev_tools.htm https://www.tutorialspoint.com/business/index.asp https://www.tutorialspoint.com/market/teach_with_us.jsp https://www.facebook.com/tutorialspointindia https://www.instagram.com/tutorialspoint_/ https://twitter.com/tutorialspoint https://www.youtube.com/channel/UCVLbzhxVTiTLiVKeGV7WEBg https://www.tutorialspoint.com/categories/development https://www.tutorialspoint.com/categories/it_and_software https://www.tutorialspoint.com/categories/data_science_and_ai_ml https://www.tutorialspoint.com/categories/cyber_security https://www.tutorialspoint.com/categories/marketing https://www.tutorialspoint.com/categories/office_productivity https://www.tutorialspoint.com/categories/business https://www.tutorialspoint.com/categories/lifestyle https://www.tutorialspoint.com/latest/prime-packs https://www.tutorialspoint.com/market/index.asp https://www.tutorialspoint.com/latest/ebooks … … To parse a web page stored locally in the current working directory, obtain the file object pointing to the html file, and use it as argument to the BeautifulSoup() constructor. Example from bs4 import BeautifulSoup with open(“index.html”) as fp: soup = BeautifulSoup(fp, ”html.parser”) print(soup) Output <html> <head> <title>Hello World</title> </head> <body> <h1 style=”text-align:center;”>Hello World</h1> </body> </html> You can also use a string that contains HTML script as constructor”s argument as follows − from bs4 import BeautifulSoup html = ””” <html> <head> <title>Hello World</title> </head> <body> <h1 style=”text-align:center;”>Hello World</h1> </body> </html> ””” soup = BeautifulSoup(html, ”html.parser”) print(soup) Beautiful Soup uses the best available parser to parse the document. It will use an HTML parser unless specified otherwise. Print Page Previous Next Advertisements ”;
Author: user
Beautiful Soup – Overview
Beautiful Soup – Overview ”; Previous Next In today”s world, we have tons of unstructured data/information (mostly web data) available freely. Sometimes the freely available data is easy to read and sometimes not. No matter how your data is available, web scraping is very useful tool to transform unstructured data into structured data that is easier to read and analyze. In other words, web scraping is a way to collect, organize and analyze this enormous amount of data. So let us first understand what is web-scraping. Introduction to Beautiful Soup The Beautiful Soup is a python library which is named after a Lewis Carroll poem of the same name in “Alice”s Adventures in the Wonderland”. Beautiful Soup is a python package and as the name suggests, parses the unwanted data and helps to organize and format the messy web data by fixing bad HTML and present to us in an easily-traversable XML structures. In short, Beautiful Soup is a python package which allows us to pull data out of HTML and XML documents. HTML tree Structure Before we look into the functionality provided by Beautiful Soup, let us first understand the HTML tree structure. The root element in the document tree is the html, which can have parents, children and siblings and this determines by its position in the tree structure. To move among HTML elements, attributes and text, you have to move among nodes in your tree structure. Let us suppose the webpage is as shown below − Which translates to an html document as follows − <html> <head> <title>TutorialsPoint</title> </head> <body> <h1>Tutorialspoint Online Library</h1> <p><b>It”s all Free</b></p> </body> </html> Which simply means, for above html document, we have a html tree structure as follows − Print Page Previous Next Advertisements ”;
Beautiful Soup – Remove all Styles ”; Previous Next This chapter explains how to remove all styles from a HTML document. Cascaded style sheets (CSS) are used to control the appearance of different aspects of a HTML document. It includes styling the rendering of text with a specific font, color, alignment, spacing etc. CSS is applied to HTML tags in different ways. One is to define different styles in a CSS file and include in the HTML script with the <link> tag in the <head> section in the document. For example, Example <html> <head> <link rel=”stylesheet” href=”style.css”> </head> <body> . . . . . . </body> </html> The different tags in the body part of the HTML script will use the definitions in mystyle.css file Another approach is to define the style configuration inside the <head> part of the HTML document itself. Tags in the body part will be rendered by using the definitions provided internally. Example of internal styling − <html> <head> <style> p { text-align: center; color: red; } </style> </head> <body> <p>para1.</p> <p id=”para1″>para2</p> <p>para3</p> </body> </html> In either cases, to remove the styles programmatically, simple remove the head tag from the soup object. from bs4 import BeautifulSoup soup = BeautifulSoup(html, “html.parser”) soup.head.extract() Third approach is to define the styles inline by including style attribute in the tag itself. The style attribute may contain one or more style attribute definitions such as color, size etc. For example <body> <h1 style=”color:blue;text-align:center;”>This is a heading</h1> <p style=”color:red;”>This is a paragraph.</p> </body> To remove such inline styles from a HTML document, you need to check if attrs dictionary of a tag object has style key defined in it, and if yes delete the same. tags=soup.find_all() for tag in tags: if tag.has_attr(”style”): del tag.attrs[”style”] print (soup) The following code removes the inline styles as well as removes the head tag itself, so that the resultant HTML tree will not have any styles left. html = ””” <html> <head> <link rel=”stylesheet” href=”style.css”> </head> <body> <h1 style=”color:blue;text-align:center;”>This is a heading</h1> <p style=”color:red;”>This is a paragraph.</p> </body> </html> ””” from bs4 import BeautifulSoup soup = BeautifulSoup(html, “html.parser”) soup.head.extract() tags=soup.find_all() for tag in tags: if tag.has_attr(”style”): del tag.attrs[”style”] print (soup.prettify()) Output <html> <body> <h1> This is a heading </h1> <p> This is a paragraph. </p> </body> </html> Print Page Previous Next Advertisements ”;
Beautiful Soup – Inspect Data Source ”; Previous Next In order to scrape a web page with BeautifulSoup and Python, your first step for any web scraping project should be to explore the website that you want to scrape. So, first visit the website to understand the site structure before you start extracting the information that”s relevant for you. Let us visit TutorialsPoint”s Python Tutorial home page. Open https://www.tutorialspoint.com/python3/index.htm in your browser. Use Developer tools can help you understand the structure of a website. All modern browsers come with developer tools installed. If using Chrome browser, open the Developer Tools from the top-right menu button (⋮) and selecting More Tools → Developer Tools. With Developer tools, you can explore the site”s document object model (DOM) to better understand your source. Select the Elements tab in developer tools. You”ll see a structure with clickable HTML elements. The Tutorial page shows the table of contents in the left sidebar. Right click on any chapter and choose Inspect option. For the Elements tab, locate the tag that corresponds to the TOC list, as shown in the figure below − Right click on the HTML element, copy the HTML element, and paste it in any editor. The HTML script of the <ul>..</ul> element is now obtained. <ul class=”toc chapters”> <li class=”heading”>Python 3 Basic Tutorial</li> <li class=”current-chapter”><a href=”/python3/index.htm”>Python 3 – Home</a></li> <li><a href=”/python3/python3_whatisnew.htm”>What is New in Python 3</a></li> <li><a href=”/python3/python_overview.htm”>Python 3 – Overview</a></li> <li><a href=”/python3/python_environment.htm”>Python 3 – Environment Setup</a></li> <li><a href=”/python3/python_basic_syntax.htm”>Python 3 – Basic Syntax</a></li> <li><a href=”/python3/python_variable_types.htm”>Python 3 – Variable Types</a></li> <li><a href=”/python3/python_basic_operators.htm”>Python 3 – Basic Operators</a></li> <li><a href=”/python3/python_decision_making.htm”>Python 3 – Decision Making</a></li> <li><a href=”/python3/python_loops.htm”>Python 3 – Loops</a></li> <li><a href=”/python3/python_numbers.htm”>Python 3 – Numbers</a></li> <li><a href=”/python3/python_strings.htm”>Python 3 – Strings</a></li> <li><a href=”/python3/python_lists.htm”>Python 3 – Lists</a></li> <li><a href=”/python3/python_tuples.htm”>Python 3 – Tuples</a></li> <li><a href=”/python3/python_dictionary.htm”>Python 3 – Dictionary</a></li> <li><a href=”/python3/python_date_time.htm”>Python 3 – Date & Time</a></li> <li><a href=”/python3/python_functions.htm”>Python 3 – Functions</a></li> <li><a href=”/python3/python_modules.htm”>Python 3 – Modules</a></li> <li><a href=”/python3/python_files_io.htm”>Python 3 – Files I/O</a></li> <li><a href=”/python3/python_exceptions.htm”>Python 3 – Exceptions</a></li> </ul> We can now load this script in a BeautifulSoup object to parse the document tree. Print Page Previous Next Advertisements ”;
Beautiful Soup – Modifying the Tree ”; Previous Next One of the powerful features of Beautiful Soup library is to be able to be able to manipulate the parsed HTML or XML document and modify its contents. Beautiful Soup library has different functions to perform the following operations − Add contents or a new tag to an existing tag of the document Insert contents before or after an existing tag or string Clear the contents of an already existing tag Modify the contents of a tag element Add content You can add to the content of an existing tag by using append() method on a Tag object. It works like the append() method of Python”s list object. In the following example, the HTML script has a <p> tag. With append(), additional text is appended. Example from bs4 import BeautifulSoup markup = ”<p>Hello</p>” soup = BeautifulSoup(markup, ”html.parser”) print (soup) tag = soup.p tag.append(” World”) print (soup) Output <p>Hello</p> <p>Hello World</p> With the append() method, you can add a new tag at the end of an existing tag. First create a new Tag object with new_tag() method and then pass it to the append() method. Example from bs4 import BeautifulSoup, Tag markup = ”<b>Hello</b>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.b tag1 = soup.new_tag(”i”) tag1.string = ”World” tag.append(tag1) print (soup.prettify()) Output <b> Hello <i> World </i> </b> If you have to add a string to the document, you can append a NavigableString object. Example from bs4 import BeautifulSoup, NavigableString markup = ”<b>Hello</b>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.b new_string = NavigableString(” World”) tag.append(new_string) print (soup.prettify()) Output <b> Hello World </b> From Beautiful Soup version 4.7 onwards, the extend() method has been added to Tag class. It adds all the elements in a list to the tag. Example from bs4 import BeautifulSoup markup = ”<b>Hello</b>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.b vals = [”World.”, ”Welcome to ”, ”TutorialsPoint”] tag.extend(vals) print (soup.prettify()) Output <b> Hello World. Welcome to TutorialsPoint </b> Insert Contents Instead of adding a new element at the end, you can use insert() method to add an element at the given position in a the list of children of a Tag element. The insert() method in Beautiful Soup behaves similar to insert() on a Python list object. In the following example, a new string is added to the <b> tag at position 1. The resultant parsed document shows the result. Example from bs4 import BeautifulSoup, NavigableString markup = ”<b>Excellent </b><u>from TutorialsPoint</u>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.b tag.insert(1, “Tutorial “) print (soup.prettify()) Output <b> Excellent Tutorial </b> <u> from TutorialsPoint </u> Beautiful Soup also has insert_before() and insert_after() methods. Their respective purpose is to insert a tag or a string before or after a given Tag object. The following code shows that a string “Python Tutorial” is added after the <b> tag. Example from bs4 import BeautifulSoup, NavigableString markup = ”<b>Excellent </b><u>from TutorialsPoint</u>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.b tag.insert_after(“Python Tutorial”) print (soup.prettify()) Output <b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u> On the other hand, insert_before() method is used below, to add “Here is an ” text before the <b> tag. tag.insert_before(“Here is an “) print (soup.prettify()) Output Here is an <b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u> Clear the Contents Beautiful Soup provides more than one ways to remove contents of an element from the document tree. Each of these methods has its unique features. The clear() method is the most straight-forward. It simply removes the contents of a specified Tag element. Following example shows its usage. Example from bs4 import BeautifulSoup, NavigableString markup = ”<b>Excellent </b><u>from TutorialsPoint</u>” soup = BeautifulSoup(markup, ”html.parser”) tag = soup.find(”u”) tag.clear() print (soup.prettify()) Output <b> Excellent </b> <u> </u> It can be seen that the clear() method removes the contents, keeping the tag intact. For the following example, we parse the following HTML document and call clear() metho on all tags. <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs./p> </body> </html> Here is the Python code using clear() method Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tags = soup.find_all() for tag in tags: tag.clear() print (soup.prettify()) Output <html> </html> The extract() method removes either a tag or a string from the document tree, and returns the object that was removed. Example from bs4 import BeautifulSoup fp = open(”index.html”) soup = BeautifulSoup(fp, ”html.parser”) tags = soup.find_all() for tag in tags: obj = tag.extract() print (“Extracted:”,obj) print (soup) Output Extracted: <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> </html> Extracted: <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> Extracted: <p> The quick, brown fox jumps over a lazy dog.</p> Extracted: <p> DJs flock by when MTV ax quiz prog.</p> Extracted: <p> Junk MTV quiz graced by fox whelps.</p> Extracted: <p> Bawds jog, flick quartz, vex
The profound influence of social media on our daily lives is undeniable. Social platforms have become an integral part of how we discover, connect, and share information. This inseparable presence of social media presents a significant opportunity for businesses to harness the power of trusted recommendations and amplify their marketing efforts. Social media marketing has emerged as a sound strategy to drive website traffic, build brand awareness, and foster meaningful customer relationships. By leveraging the reach and engagement of social channels such as Facebook, Twitter, LinkedIn, and Instagram, businesses can effectively market their products and services, generate leads, and support their overall SEO initiatives. “Nothing influences people more than a recommendation from a trusted friend.” − Mark Zuckerberg, Founder of Facebook Social media offers a great opportunity to discover new information, connect and interact with others, and share their perspectives. The inseparable presence of social media in the daily lives provides a great boost to market the business. Social media marketing is a sound way to gain the website traffic via social channels such as Facebook, Twitter, Pinterest, etc. The content posted on these channels captivates people to go through it, thus advertising your business. Social media marketing helps you build quality links, thus supporting your SEO efforts. Importance of Social Media Social media is important for the following reasons − Using Different Social media Platforms In recent years, the social media landscape has evolved dramatically, with the meteoric rise of TikTok as a dominant force in the digital sphere. As a platform that specializes in short-form, engaging video content, TikTok has captured the attention of a vast and diverse audience, making it a compelling channel for businesses to explore in their social media marketing efforts. TikTok’s unique features and user behavior present both opportunities and challenges for brands looking to leverage this platform effectively. Unlike traditional social media platforms, TikTok is centered around the consumption and creation of entertaining, often trending-driven content. This creates a unique dynamic where authenticity, creativity, and a deep understanding of the platform’s culture are critical for successful brand engagement. One of the key advantages of TikTok is its ability to reach a younger demographic, particularly Gen Z consumers, who have increasingly shifted their attention away from traditional social media platforms. By crafting captivating, on-trend content that resonates with this audience, businesses can effectively tap into a valuable and influential market segment. Social media platforms vary by different factors. Over 50% of social media users use 2+ platforms and they do so for different interest. Different platforms serve different purpose. You may not have ample time to spend on every platform daily but you may use maximum of every single one. Facebook Facebook is the largest social networking site these days. This makes it prominent business gainer. You can post images, videos, and anything related to your industry. Simultaneously, you may indulge in conversation with your audience by posting and commenting. To make maximum use of Facebook, you can create a Facebook business page in an appealing layout. Make efforts to attract people to like and share it. You can post what you have to offer on the page. Include visuals for better results. Google+ Google+ lets you upload and share visuals. Take advantage of +1 and Google+ circles. This lets you segment the customers accordingly and refrain those who may not be useful to your business. Follow others to learn the contemporary trends. Pinterest Pinterest is an emerging social media platform that allows you to showcase what you have to offer. You can create pinboards for your products and services and invite others to follow you. The pins on pinboards include link to your website. Post attractive images of your products with specification and let people follow you freely. Twitter It lets you broadcast any update on internet. Follow people or companies related to your business and gain followers in return. Use hashtag to capture audience who are not your followers. Tweet with an embedded link of your site to get traffic onto it. LinkedIn It is the largest professional social marketing site to let you contact the other professionals related to your field. You can hire or get hired on Linkedin. You can explore all categories and follow people. Invite others to see what they are up to. You can build a strong business profile to shine on it. You can encourage customers for recommendation. It makes you appear more credible and trustworthy. Instagram Instagram has devoted users. It lets you share pictures and videos with family and friends. It makes your business look interesting and innovative. On Instagram, you can post your content in the form of images. YouTube YouTube is a video sharing website. You can upload and view a video. You can also comment on it. YouTube can help you immensely in building brand awareness in a quick span of time. Social Media Marketing Tips Here are some social media marketing tips −
As the famous proverb states, “what gets measured, gets managed.” This adage holds true in the realm of online marketing, where properly measuring and tracking your efforts can mean the difference between a thriving digital presence and a stagnant one. By understanding and closely monitoring key metrics, businesses can gain invaluable insights that enable them to refine their strategies, optimize their campaigns, and ultimately, drive better results. In this comprehensive guide, we will explore the essential metrics that every online marketer should be tracking to measure the success of their digital initiatives. From total website visits to conversion rates and return on investment (ROI), we will delve into the data that empowers you to make informed decisions and propel your business forward. As a famous proverb says, “what gets measured, gets managed”. Measuring online marketing efforts results in properly managed website and boosting business. Properly measured metrics give you the insight of data and enables you to predict the revenue better. The following metrics measure your online marketing efforts − Total Visits Total visits is the swarming traffic on you website. It gives you enough idea of how well your campaigns are driving. If it seems to fall, you need to investigate the marketing channels. The total number of visits should keep rising in order to state your website a healthy one. New Sessions This lets you figure out new and recurring visitors on your website. If they are on rise means your website is compelling and informative enough to catch the attention of the customers and sticky enough to encourage the previous visitors. Bounce Rate It shows how many visitors leave your website without further exploring it. A higher bounce rate is a matter of concern. It should decrease as low as it can. Channel-Specific traffic It gives you source of origin of traffic. This helps a lot in deciding which channels are performing well over others. Conversions The number of conversions measures the overall productivity of an online Ad. It tells success story of your overall marketing efforts. Lower conversion rate may be due to poor products/services or irrelevant visitors. Cost Per Conversions (CPC) It gives you clear picture of how much you are spending over earnings. It lets you decide what you should be investing in further. Return On Investment (ROI) ROI reveals profitability. A positive ROI means a successful and well implemented Ad campaign laid on plans and strategies whereas negative ROI is a result of bad offering and bouncing visitors. It is a matter of concern. You must check these metrics on a regular basis. This will help you examine your website well and decide which metrics work best. On the basis of these metrics, you can work using right strategy to cover enough leads.
Online marketing has its own unique vocabulary, with a wide range of specialized terms and concepts that are essential for anyone working in the field to understand. From the key players like advertisers and publishers, to important metrics like click-through rate and cost-per-acquisition, navigating the world of digital marketing can be challenging without a solid grasp of the industry lingo. In this comprehensive guide, we will define and explain the standard terms used in the domain of online marketing. Whether you are a business owner looking to advertise your products and services online, a digital marketer crafting campaigns, or simply someone interested in the mechanics of e-commerce, familiarizing yourself with these common terms will provide you with a valuable foundation for success in the digital realm. Here is a list of the standard terms used in the domain of online marketing − Advertiser It is a person or an organization that places advertisements to drive sale or lead through it. Banner It is an online advertisement in the form of a graphic image that appears on a web page. Bid It is the maximum amount an advertiser is ready to pay for a click. Black Hat and White Hat Tactics They both are the tactics of online marketing. There is no color significance about being good or bad. Breadcrumbs Navigation It is a navigation scheme that reveals user’s location on the website or application. It offers a way to trace the path back to the user’s original landing point. Campaign It is a series of operations performed to achieve a desired goal in a particular area. Click Through Rate (CTR) Click Through Rate = Clicks / Impressions % Conversion A visitor when completes a target action. Cost Per Acquisition (CPA) It is the cost the advertiser pays only when a desired action is achieved. Cost Per Click (CPC) It refers to the amount the advertiser pays when his Ad is clicked on, giving him a visitor to his website − typically from a search engine in PPC marketing. Cost per Mille (CPM) It is the amount paid for every 1000 impressions of an advertisement. Customer Pain Points They are annoying, frustrating, and difficult to solve things or situations for the customer, which the customers may not have anticipated or cannot verbalize. They need urgent addressing. If This Then That (IFTTT) It is a web-based service with which the users can create chain of primitive conditional statements, called recipes. The recipes are triggered based on changes to other web services such as Gmail, Facebook, Instagram, etc. Inbound Link It is a hyperlink on a third-party web page that points to a web page on your website. Key Performance Indicator (KPI) It is a metric that shows whether an objective of the business is achieved. Market Reach It is the total number of people or households exposed at least once to a medium of Advertising in given span of time. Paid Search Advertising It refers to paid advertising on search engines, sometimes called PPC advertising. The advertiser pays only for each click on the Ad. Publisher It provides the advertisers a required amount of space on its website to publish the advertisement. Quality Score It is a variable that influences ranking of a website. Search Engine Optimization It is process of elevating website ranking in the unpaid results of search engine. Tracking It is measuring the effectiveness of an online advertise by collecting and evaluating statistics. Web Indexing It is the method of indexing the contents of the website or the internet as a whole.