”;
One of the often used tags in HTML is the <script> tag. It facilitates embedding a client side script such as JavaScript code in HTML. In this chapter, we will use BeautifulSoup to remove script tags from the HTML document.
The <script> tag has a corresponding </script> tag. In between the two, you may include either a reference to an external JavaScript file, or include JavaScript code inline with the HTML script itself.
To include an external Javascript file, the syntax used is −
<head> <script src="javascript.js"></script> </head>
You can then invoke the functions defined in this file from inside HTML.
Instead of referring to an external file, you can put JavaScipt code inside the HTML within the <script> and </script> code. If it is put inside the <head> section of the HTML document, then the functionality is available throughout the document tree. On the other hand, if put anywhere in the <body> section, the JavaScript functions are available from that point on.
<body> <p>Hello World</p> <script> alert("Hello World") </script> </body>
To remove all script tags with Beautiful is easy. You have to collect the list of all script tags from the parsed tree and extract them one by one.
Example
html = '''''' <html> <head> <script src="javascript.js"></scrript> </head> <body> <p>Hello World</p> <script> alert("Hello World") </script> </body> </html> '''''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") for tag in soup.find_all(''script''): tag.extract() print (soup)
Output
<html> <head> </head> </html>
You can also use the decompose() method instead of extract(), the difference being that that the latter returns the thing that was removed, whereas the former just destroys it. For a more concise code, you may also use list comprehension syntax to achieve the soup object with script tags removed, as follows −
[tag.decompose() for tag in soup.find_all(''script'')]
”;