Python Digital Forensics – Resources

Python Digital Forensics – Resources ”; Previous Next The following resources contain additional information on Python Digital Forensics. Please use them to get more in-depth knowledge on this. Useful Video Courses Python Flask and SQLAlchemy ORM 22 Lectures 1.5 hours Jack Chan More Detail Python and Elixir Programming Bundle Course 81 Lectures 9.5 hours Pranjal Srivastava More Detail TKinter Course – Build Python GUI Apps 49 Lectures 4 hours John Elder More Detail A Beginner”s Guide to Python and Data Science 81 Lectures 8.5 hours Datai Team Academy More Detail Deploy Face Recognition Project With Python, Django, And Machine Learning Best Seller 93 Lectures 6.5 hours Srikanth Guskra More Detail Professional Python Web Development with Flask 80 Lectures 12 hours Stone River ELearning More Detail Print Page Previous Next Advertisements ”;

Network Forensics-II

Python Digital Network Forensics-II ”; Previous Next The previous chapter dealt with some of the concepts of network forensics using Python. In this chapter, let us understand network forensics using Python at a deeper level. Web Page Preservation with Beautiful Soup The World Wide Web (WWW) is a unique resource of information. However, its legacy is at high risk due to the loss of content at an alarming rate. A number of cultural heritage and academic institutions, non-profit organizations and private businesses have explored the issues involved and contributed to the development of technical solutions for web archiving. Web page preservation or web archiving is the process of gathering the data from World Wide Web, ensuring that the data is preserved in an archive and making it available for future researchers, historians and the public. Before proceeding further into the web page preservation, let us discuss some important issues related to web page preservation as given below − Change in Web Resources − Web resources keep changing everyday which is a challenge for web page preservation. Large Quantity of Resources − Another issue related to web page preservation is the large quantity of resources which is to be preserved. Integrity − Web pages must be protected from unauthorized amendments, deletion or removal to protect its integrity. Dealing with multimedia data − While preserving web pages we need to deal with multimedia data also, and these might cause issues while doing so. Providing access − Besides preserving, the issue of providing access to web resources and dealing with issues of ownership needs to be solved too. In this chapter, we are going to use Python library named Beautiful Soup for web page preservation. What is Beautiful Soup? Beautiful Soup is a Python library for pulling data out of HTML and XML files. It can be used with urlib because it needs an input (document or url) to create a soup object, as it cannot fetch web page itself. You can learn in detail about this at www.crummy.com/software/BeautifulSoup/bs4/doc/ Note that before using it, we must install a third party library using the following command − pip install bs4 Next, using Anaconda package manager, we can install Beautiful Soup as follows − conda install -c anaconda beautifulsoup4 Python Script for Preserving Web Pages The Python script for preserving web pages by using third party library called Beautiful Soup is discussed here − First, import the required libraries as follows − from __future__ import print_function import argparse from bs4 import BeautifulSoup, SoupStrainer from datetime import datetime import hashlib import logging import os import ssl import sys from urllib.request import urlopen import urllib.error logger = logging.getLogger(__name__) Note that this script will take two positional arguments, one is URL which is to be preserved and other is the desired output directory as shown below − if __name__ == “__main__”: parser = argparse.ArgumentParser(”Web Page preservation”) parser.add_argument(“DOMAIN”, help=”Website Domain”) parser.add_argument(“OUTPUT_DIR”, help=”Preservation Output Directory”) parser.add_argument(“-l”, help=”Log file path”, default=__file__[:-3] + “.log”) args = parser.parse_args() Now, setup the logging for the script by specifying a file and stream handler for being in loop and document the acquisition process as shown − logger.setLevel(logging.DEBUG) msg_fmt = logging.Formatter(“%(asctime)-15s %(funcName)-10s””%(levelname)-8s %(message)s”) strhndl = logging.StreamHandler(sys.stderr) strhndl.setFormatter(fmt=msg_fmt) fhndl = logging.FileHandler(args.l, mode=”a”) fhndl.setFormatter(fmt=msg_fmt) logger.addHandler(strhndl) logger.addHandler(fhndl) logger.info(“Starting BS Preservation”) logger.debug(“Supplied arguments: {}”.format(sys.argv[1:])) logger.debug(“System ” + sys.platform) logger.debug(“Version ” + sys.version) Now, let us do the input validation on the desired output directory as follows − if not os.path.exists(args.OUTPUT_DIR): os.makedirs(args.OUTPUT_DIR) main(args.DOMAIN, args.OUTPUT_DIR) Now, we will define the main() function which will extract the base name of the website by removing the unnecessary elements before the actual name along with additional validation on the input URL as follows − def main(website, output_dir): base_name = website.replace(“https://”, “”).replace(“http://”, “”).replace(“www.”, “”) link_queue = set() if “http://” not in website and “https://” not in website: logger.error(“Exiting preservation – invalid user input: {}”.format(website)) sys.exit(1) logger.info(“Accessing {} webpage”.format(website)) context = ssl._create_unverified_context() Now, we need to open a connection with the URL by using urlopen() method. Let us use try-except block as follows − try: index = urlopen(website, context=context).read().decode(“utf-8”) except urllib.error.HTTPError as e: logger.error(“Exiting preservation – unable to access page: {}”.format(website)) sys.exit(2) logger.debug(“Successfully accessed {}”.format(website)) The next lines of code include three function as explained below − write_output() to write the first web page to the output directory find_links() function to identify the links on this web page recurse_pages() function to iterate through and discover all links on the web page. write_output(website, index, output_dir) link_queue = find_links(base_name, index, link_queue) logger.info(“Found {} initial links on webpage”.format(len(link_queue))) recurse_pages(website, link_queue, context, output_dir) logger.info(“Completed preservation of {}”.format(website)) Now, let us define write_output() method as follows − def write_output(name, data, output_dir, counter=0): name = name.replace(“http://”, “”).replace(“https://”, “”).rstrip(“//”) directory = os.path.join(output_dir, os.path.dirname(name)) if not os.path.exists(directory) and os.path.dirname(name) != “”: os.makedirs(directory) We need to log some details about the web page and then we log the hash of the data by using hash_data() method as follows − logger.debug(“Writing {} to {}”.format(name, output_dir)) logger.debug(“Data Hash: {}”.format(hash_data(data))) path = os.path.join(output_dir, name) path = path + “_” + str(counter) with open(path, “w”) as outfile: outfile.write(data) logger.debug(“Output File Hash: {}”.format(hash_file(path))) Now, define hash_data() method with the help of which we read the UTF-8 encoded data and then generate the SHA-256 hash of it as follows − def hash_data(data): sha256 = hashlib.sha256() sha256.update(data.encode(“utf-8”)) return sha256.hexdigest() def hash_file(file): sha256 = hashlib.sha256() with open(file, “rb”) as in_file: sha256.update(in_file.read()) return sha256.hexdigest() Now, let us create a Beautifulsoup object out of the web page data under find_links() method as follows − def find_links(website, page, queue): for link in BeautifulSoup(page, “html.parser”,parse_only = SoupStrainer(“a”, href = True)): if website in link.get(“href”): if not os.path.basename(link.get(“href”)).startswith(“#”): queue.add(link.get(“href”)) return queue Now, we need to define recurse_pages() method by providing it the inputs of the website URL, current link queue, the unverified SSL context and the output directory as follows − def recurse_pages(website, queue, context, output_dir): processed = [] counter = 0 while True: counter += 1 if len(processed) == len(queue): break for link in queue.copy(): if link in processed: continue processed.append(link) try: page =

Important Artifacts In Windows-II

Important Artifacts In Windows-II ”; Previous Next This chapter talks about some more important artifacts in Windows and their extraction method using Python. User Activities Windows having NTUSER.DAT file for storing various user activities. Every user profile is having hive like NTUSER.DAT, which stores the information and configurations related to that user specifically. Hence, it is highly useful for the purpose of investigation by forensic analysts. The following Python script will parse some of the keys of NTUSER.DAT for exploring the actions of a user on the system. Before proceeding further, for Python script, we need to install third party modules namely Registry, pytsk3, pyewf and Jinja2. We can use pip to install them. We can follow the following steps to extract information from NTUSER.DAT file − First, search for all NTUSER.DAT files in the system. Then parse the WordWheelQuery, TypePath and RunMRU key for each NTUSER.DAT file. At last we will write these artifacts, already processed, to an HTML report by using Jinja2 fmodule. Python Code Let us see how to use Python code for this purpose − First of all, we need to import the following Python modules − from __future__ import print_function from argparse import ArgumentParser import os import StringIO import struct from utility.pytskutil import TSKUtil from Registry import Registry import jinja2 Now, provide argument for command-line handler. Here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the desired output path to the HTML report, as shown below − if __name__ == ”__main__”: parser = argparse.ArgumentParser(”Information from user activities”) parser.add_argument(”EVIDENCE_FILE”,help = “Path to evidence file”) parser.add_argument(”IMAGE_TYPE”,help = “Evidence file format”,choices = (”ewf”, ”raw”)) parser.add_argument(”REPORT”,help = “Path to report file”) args = parser.parse_args() main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT) Now, let us define main() function for searching all NTUSER.DAT files, as shown − def main(evidence, image_type, report): tsk_util = TSKUtil(evidence, image_type) tsk_ntuser_hives = tsk_util.recurse_files(”ntuser.dat”,”/Users”, ”equals”) nt_rec = { ”wordwheel”: {”data”: [], ”title”: ”WordWheel Query”}, ”typed_path”: {”data”: [], ”title”: ”Typed Paths”}, ”run_mru”: {”data”: [], ”title”: ”Run MRU”} } Now, we will try to find the key in NTUSER.DAT file and once you find it, define the user processing functions as shown below − for ntuser in tsk_ntuser_hives: uname = ntuser[1].split(“/”) open_ntuser = open_file_as_reg(ntuser[2]) try: explorer_key = open_ntuser.root().find_key(“Software”).find_key(“Microsoft”) .find_key(“Windows”).find_key(“CurrentVersion”).find_key(“Explorer”) except Registry.RegistryKeyNotFoundException: continue nt_rec[”wordwheel”][”data”] += parse_wordwheel(explorer_key, uname) nt_rec[”typed_path”][”data”] += parse_typed_paths(explorer_key, uname) nt_rec[”run_mru”][”data”] += parse_run_mru(explorer_key, uname) nt_rec[”wordwheel”][”headers”] = nt_rec[”wordwheel”][”data”][0].keys() nt_rec[”typed_path”][”headers”] = nt_rec[”typed_path”][”data”][0].keys() nt_rec[”run_mru”][”headers”] = nt_rec[”run_mru”][”data”][0].keys() Now, pass the dictionary object and its path to write_html() method as follows − write_html(report, nt_rec) Now, define a method, that takes pytsk file handle and read it into the Registry class via the StringIO class. def open_file_as_reg(reg_file): file_size = reg_file.info.meta.size file_content = reg_file.read_random(0, file_size) file_like_obj = StringIO.StringIO(file_content) return Registry.Registry(file_like_obj) Now, we will define the function that will parse and handles WordWheelQuery key from NTUSER.DAT file as follows − def parse_wordwheel(explorer_key, username): try: wwq = explorer_key.find_key(“WordWheelQuery”) except Registry.RegistryKeyNotFoundException: return [] mru_list = wwq.value(“MRUListEx”).value() mru_order = [] for i in xrange(0, len(mru_list), 2): order_val = struct.unpack(”h”, mru_list[i:i + 2])[0] if order_val in mru_order and order_val in (0, -1): break else: mru_order.append(order_val) search_list = [] for count, val in enumerate(mru_order): ts = “N/A” if count == 0: ts = wwq.timestamp() search_list.append({ ”timestamp”: ts, ”username”: username, ”order”: count, ”value_name”: str(val), ”search”: wwq.value(str(val)).value().decode(“UTF-16”).strip(“x00”) }) return search_list Now, we will define the function that will parse and handles TypedPaths key from NTUSER.DAT file as follows − def parse_typed_paths(explorer_key, username): try: typed_paths = explorer_key.find_key(“TypedPaths”) except Registry.RegistryKeyNotFoundException: return [] typed_path_details = [] for val in typed_paths.values(): typed_path_details.append({ “username”: username, “value_name”: val.name(), “path”: val.value() }) return typed_path_details Now, we will define the function that will parse and handles RunMRU key from NTUSER.DAT file as follows − def parse_run_mru(explorer_key, username): try: run_mru = explorer_key.find_key(“RunMRU”) except Registry.RegistryKeyNotFoundException: return [] if len(run_mru.values()) == 0: return [] mru_list = run_mru.value(“MRUList”).value() mru_order = [] for i in mru_list: mru_order.append(i) mru_details = [] for count, val in enumerate(mru_order): ts = “N/A” if count == 0: ts = run_mru.timestamp() mru_details.append({ “username”: username, “timestamp”: ts, “order”: count, “value_name”: val, “run_statement”: run_mru.value(val).value() }) return mru_details Now, the following function will handle the creation of HTML report − def write_html(outfile, data_dict): cwd = os.path.dirname(os.path.abspath(__file__)) env = jinja2.Environment(loader=jinja2.FileSystemLoader(cwd)) template = env.get_template(“user_activity.html”) rendering = template.render(nt_data=data_dict) with open(outfile, ”w”) as open_outfile: open_outfile.write(rendering) At last we can write HTML document for report. After running the above script, we will get the information from NTUSER.DAT file in HTML document format. LINK files Shortcuts files are created when a user or the operating system creates shortcut files for the files which are frequently used, double clicked or accessed from system drives such as attached storage. Such kinds of shortcut files are called link files. By accessing these link files, an investigator can find the activity of window such as the time and location from where these files have been accessed. Let us discuss the Python script that we can use to get the information from these Windows LINK files. For Python script, install third party modules namely pylnk, pytsk3, pyewf. We can follow the following steps to extract information from lnk files First, search for lnk files within the system. Then, extract the information from that file by iterating through them. Now, at last we need to this information to a CSV report. Python Code Let us see how to use Python code for this purpose − First, import the following Python libraries − from __future__ import print_function from argparse import ArgumentParser import csv import StringIO from utility.pytskutil import TSKUtil import pylnk Now, provide the argument for command-line handler. Here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the desired output path to the CSV report, as shown below − if __name__ == ”__main__”: parser = argparse.ArgumentParser(”Parsing LNK files”) parser.add_argument(”EVIDENCE_FILE”, help = “Path to evidence file”) parser.add_argument(”IMAGE_TYPE”, help = “Evidence file format”,choices = (”ewf”, ”raw”)) parser.add_argument(”CSV_REPORT”, help = “Path to CSV report”) args = parser.parse_args() main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT) Now, interpret

Python Digital Forensics – Discussion

Discuss Python Digital Forensics ”; Previous Next Digital forensics is the branch of forensic science that analyzes, examines, identifies as well as recovers the digital evidences from electronic devices. It is commonly used in criminal law and private investigation. This tutorial will make you comfortable with performing Digital Forensics in Python on Windows operated digital devices. In this tutorial, you will learn various concepts and coding for carrying out digital forensics in Python. Print Page Previous Next Advertisements ”;

Quick Guide

Python Digital Forensics – Quick Guide ”; Previous Next Python Digital Forensics – Introduction This chapter will give you an introduction to what digital forensics is all about, and its historical review. You will also understand where you can apply digital forensics in real life and its limitations. What is Digital Forensics? Digital forensics may be defined as the branch of forensic science that analyzes, examines, identifies and recovers the digital evidences residing on electronic devices. It is commonly used for criminal law and private investigations. For example, you can rely on digital forensics extract evidences in case somebody steals some data on an electronic device. Brief Historical Review of Digital Forensics The history of computer crimes and the historical review of digital forensics is explained in this section as given below − 1970s-1980s: First Computer Crime Prior to this decade, no computer crime has been recognized. However, if it is supposed to happen, the then existing laws dealt with them. Later, in 1978 the first computer crime was recognized in Florida Computer Crime Act, which included legislation against unauthorized modification or deletion of data on a computer system. But over the time, due to the advancement of technology, the range of computer crimes being committed also increased. To deal with crimes related to copyright, privacy and child pornography, various other laws were passed. 1980s-1990s: Development Decade This decade was the development decade for digital forensics, all because of the first ever investigation (1986) in which Cliff Stoll tracked the hacker named Markus Hess. During this period, two kind of digital forensics disciplines developed – first was with the help of ad-hoc tools and techniques developed by practitioners who took it as a hobby, while the second being developed by scientific community. In 1992, the term “Computer Forensics”was used in academic literature. 2000s-2010s: Decade of Standardization After the development of digital forensics to a certain level, there was a need of making some specific standards that can be followed while performing investigations. Accordingly, various scientific agencies and bodies have published guidelines for digital forensics. In 2002, Scientific Working Group on Digital Evidence (SWGDE) published a paper named “Best practices for Computer Forensics”. Another feather in the cap was a European led international treaty namely “The Convention on Cybercrime” was signed by 43 nations and ratified by 16 nations. Even after such standards, still there is a need to resolve some issues which has been identified by researchers. Process of Digital Forensics Since first ever computer crime in 1978, there is a huge increment in digital criminal activities. Due to this increment, there is a need for structured manner to deal with them. In 1984, a formalized process has been introduced and after that a great number of new and improved computer forensics investigation processes have been developed. A computer forensics investigation process involves three major phases as explained below − Phase 1: Acquisition or Imaging of Exhibits The first phase of digital forensics involves saving the state of the digital system so that it can be analyzed later. It is very much similar to taking photographs, blood samples etc. from a crime scene. For example, it involves capturing an image of allocated and unallocated areas of a hard disk or RAM. Phase 2: Analysis The input of this phase is the data acquired in the acquisition phase. Here, this data was examined to identify evidences. This phase gives three kinds of evidences as follows − Inculpatory evidences − These evidences support a given history. Exculpatory evidences − These evidences contradict a given history. Evidence of tampering − These evidences show that the system was tempered to avoid identification. It includes examining the files and directory content for recovering the deleted files. Phase 3: Presentation or Reporting As the name suggests, this phase presents the conclusion and corresponding evidences from the investigation. Applications of Digital Forensics Digital forensics deals with gathering, analyzing and preserving the evidences that are contained in any digital device. The use of digital forensics depends on the application. As mentioned earlier, it is used mainly in the following two applications − Criminal Law In criminal law, the evidence is collected to support or oppose a hypothesis in the court. Forensics procedures are very much similar to those used in criminal investigations but with different legal requirements and limitations. Private Investigation Mainly corporate world uses digital forensics for private investigation. It is used when companies are suspicious that employees may be performing an illegal activity on their computers that is against company policy. Digital forensics provides one of the best routes for company or person to take when investigating someone for digital misconduct. Branches of Digital Forensics The digital crime is not restricted to computers alone, however hackers and criminals are using small digital devices such as tablets, smart-phones etc. at a very large scale too. Some of the devices have volatile memory, while others have non-volatile memory. Hence depending upon type of devices, digital forensics has the following branches − Computer Forensics This branch of digital forensics deals with computers, embedded systems and static memories such as USB drives. Wide range of information from logs to actual files on drive can be investigated in computer forensics. Mobile Forensics This deals with investigation of data from mobile devices. This branch is different from computer forensics in the sense that mobile devices have an inbuilt communication system which is useful for providing useful information related to location. Network Forensics This deals with the monitoring and analysis of computer network traffic, both local and WAN(wide area network) for the purposes of information gathering, evidence collection, or intrusion detection. Database Forensics This branch of digital forensics deals with forensics study of databases and their metadata. Skills Required for Digital Forensics Investigation Digital forensics examiners help to track hackers, recover stolen data, follow computer attacks back to their source, and aid in other types of investigations involving computers. Some of the key skills required to become digital forensics

Investigation Using Emails

Investigation Using Emails ”; Previous Next The previous chapters discussed about the importance and the process of network forensics and the concepts involved. In this chapter, let us learn about the role of emails in digital forensics and their investigation using Python. Role of Email in Investigation Emails play a very important role in business communications and have emerged as one of the most important applications on internet. They are a convenient mode for sending messages as well as documents, not only from computers but also from other electronic gadgets such as mobile phones and tablets. The negative side of emails is that criminals may leak important information about their company. Hence, the role of emails in digital forensics has been increased in recent years. In digital forensics, emails are considered as crucial evidences and Email Header Analysis has become important to collect evidence during forensic process. An investigator has the following goals while performing email forensics − To identify the main criminal To collect necessary evidences To presenting the findings To build the case Challenges in Email Forensics Email forensics play a very important role in investigation as most of the communication in present era relies on emails. However, an email forensic investigator may face the following challenges during the investigation − Fake Emails The biggest challenge in email forensics is the use of fake e-mails that are created by manipulating and scripting headers etc. In this category criminals also use temporary email which is a service that allows a registered user to receive email at a temporary address that expires after a certain time period. Spoofing Another challenge in email forensics is spoofing in which criminals used to present an email as someone else’s. In this case the machine will receive both fake as well as original IP address. Anonymous Re-emailing Here, the Email server strips identifying information from the email message before forwarding it further. This leads to another big challenge for email investigations. Techniques Used in Email Forensic Investigation Email forensics is the study of source and content of email as evidence to identify the actual sender and recipient of a message along with some other information such as date/time of transmission and intention of sender. It involves investigating metadata, port scanning as well as keyword searching. Some of the common techniques which can be used for email forensic investigation are Header Analysis Server investigation Network Device Investigation Sender Mailer Fingerprints Software Embedded Identifiers In the following sections, we are going to learn how to fetch information using Python for the purpose of email investigation. Extraction of Information from EML files EML files are basically emails in file format which are widely used for storing email messages. They are structured text files that are compatible across multiple email clients such as Microsoft Outlook, Outlook Express, and Windows Live Mail. An EML file stores email headers, body content, attachment data as plain text. It uses base64 to encode binary data and Quoted-Printable (QP) encoding to store content information. The Python script that can be used to extract information from EML file is given below − First, import the following Python libraries as shown below − from __future__ import print_function from argparse import ArgumentParser, FileType from email import message_from_file import os import quopri import base64 In the above libraries, quopri is used to decode the QP encoded values from EML files. Any base64 encoded data can be decoded with the help of base64 library. Next, let us provide argument for command-line handler. Note that here it will accept only one argument which would be the path to EML file as shown below − if __name__ == ”__main__”: parser = ArgumentParser(”Extracting information from EML file”) parser.add_argument(“EML_FILE”,help=”Path to EML File”, type=FileType(”r”)) args = parser.parse_args() main(args.EML_FILE) Now, we need to define main() function in which we will use the method named message_from_file() from email library to read the file like object. Here we will access the headers, body content, attachments and other payload information by using resulting variable named emlfile as shown in the code given below − def main(input_file): emlfile = message_from_file(input_file) for key, value in emlfile._headers: print(“{}: {}”.format(key, value)) print(“nBodyn”) if emlfile.is_multipart(): for part in emlfile.get_payload(): process_payload(part) else: process_payload(emlfile[1]) Now, we need to define process_payload() method in which we will extract message body content by using get_payload() method. We will decode QP encoded data by using quopri.decodestring() function. We will also check the content MIME type so that it can handle the storage of the email properly. Observe the code given below − def process_payload(payload): print(payload.get_content_type() + “n” + “=” * len(payload.get_content_type())) body = quopri.decodestring(payload.get_payload()) if payload.get_charset(): body = body.decode(payload.get_charset()) else: try: body = body.decode() except UnicodeDecodeError: body = body.decode(”cp1252”) if payload.get_content_type() == “text/html”: outfile = os.path.basename(args.EML_FILE.name) + “.html” open(outfile, ”w”).write(body) elif payload.get_content_type().startswith(”application”): outfile = open(payload.get_filename(), ”wb”) body = base64.b64decode(payload.get_payload()) outfile.write(body) outfile.close() print(“Exported: {}n”.format(outfile.name)) else: print(body) After executing the above script, we will get the header information along with various payloads on the console. Analyzing MSG Files using Python Email messages come in many different formats. MSG is one such kind of format used by Microsoft Outlook and Exchange. Files with MSG extension may contain plain ASCII text for the headers and the main message body as well as hyperlinks and attachments. In this section, we will learn how to extract information from MSG file using Outlook API. Note that the following Python script will work only on Windows. For this, we need to install third party Python library named pywin32 as follows − pip install pywin32 Now, import the following libraries using the commands shown − from __future__ import print_function from argparse import ArgumentParser import os import win32com.client import pywintypes Now, let us provide an argument for command-line handler. Here it will accept two arguments one would be the path to MSG file and other would be the desired output folder as follows − if __name__ == ”__main__”: parser = ArgumentParser(‘Extracting information from MSG file’) parser.add_argument(“MSG_FILE”, help=”Path to MSG file”) parser.add_argument(“OUTPUT_DIR”, help=”Path to output

Artifact Report

Artifact Report ”; Previous Next Now that you are comfortable with installation and running Python commands on your local system, let us move into the concepts of forensics in detail. This chapter will explain various concepts involved in dealing with artifacts in Python digital forensics. Need of Report Creation The process of digital forensics includes reporting as the third phase. This is one of the most important parts of digital forensic process. Report creation is necessary due to the following reasons − It is the document in which digital forensic examiner outlines the investigation process and its findings. A good digital forensic report can be referenced by another examiner to achieve same result by given same repositories. It is a technical and scientific document that contains facts found within the 1s and 0s of digital evidence. General Guidelines for Report Creation The reports are written to provide information to the reader and must start with a solid foundation. investigators can face difficulties in efficiently presenting their findings if the report is prepared without some general guidelines or standards. Some general guidelines which must be followed while creating digital forensic reports are given below − Summary − The report must contain the brief summary of information so that the reader can ascertain the report’s purpose. Tools used − We must mention the tools which have been used for carrying the process of digital forensics, including their purpose. Repository − Suppose, we investigated someone’s computer then the summary of evidence and analysis of relevant material like email, internal search history etc., then they must be included in the report so that the case may be clearly presented. Recommendations for counsel − The report must have the recommendations for counsel to continue or cease investigation based on the findings in report. Creating Different Type of Reports In the above section, we came to know about the importance of report in digital forensics along with the guidelines for creating the same. Some of the formats in Python for creating different kind of reports are discussed below − CSV Reports One of the most common output formats of reports is a CSV spreadsheet report. You can create a CSV to create a report of processed data using the Python code as shown below − First, import useful libraries for writing the spreadsheet − from __future__ import print_function import csv import os import sys Now, call the following method − Write_csv(TEST_DATA_LIST, [“Name”, “Age”, “City”, “Job description”], os.getcwd()) We are using the following global variable to represent sample data types − TEST_DATA_LIST = [[“Ram”, 32, Bhopal, Manager], [“Raman”, 42, Indore, Engg.], [“Mohan”, 25, Chandigarh, HR], [“Parkash”, 45, Delhi, IT]] Next, let us define the method to proceed for further operations. We open the file in the “w” mode and set the newline keyword argument to an empty string. def Write_csv(data, header, output_directory, name = None): if name is None: name = “report1.csv” print(“[+] Writing {} to {}”.format(name, output_directory)) with open(os.path.join(output_directory, name), “w”, newline = “”) as csvfile: writer = csv.writer(csvfile) writer.writerow(header) writer.writerow(data) If you run the above script, you will get the following details stored in report1.csv file. Name Age City Designation Ram 32 Bhopal Managerh Raman 42 Indore Engg Mohan 25 Chandigarh HR Parkash 45 Delhi IT Excel Reports Another common output format of reports is Excel (.xlsx) spreadsheet report. We can create table and also plot the graph by using Excel. We can create report of processed data in Excel format using Python code as shown below− First, import XlsxWriter module for creating spreadsheet − import xlsxwriter Now, create a workbook object. For this, we need to use Workbook() constructor. workbook = xlsxwriter.Workbook(”report2.xlsx”) Now, create a new worksheet by using add_worksheet() module. worksheet = workbook.add_worksheet() Next, write the following data into the worksheet − report2 = ([”Ram”, 32, ‘Bhopal’],[”Mohan”,25, ‘Chandigarh’] ,[”Parkash”,45, ‘Delhi’]) row = 0 col = 0 You can iterate over this data and write it as follows − for item, cost in (a): worksheet.write(row, col, item) worksheet.write(row, col+1, cost) row + = 1 Now, let us close this Excel file by using close() method. workbook.close() The above script will create an Excel file named report2.xlsx having the following data − Ram 32 Bhopal Mohan 25 Chandigarh Parkash 45 Delhi Investigation Acquisition Media It is important for an investigator to have the detailed investigative notes to accurately recall the findings or put together all the pieces of investigation. A screenshot is very useful to keep track of the steps taken for a particular investigation. With the help of the following Python code, we can take the screenshot and save it on hard disk for future use. First, install Python module named pyscreenshot by using following command − Pip install pyscreenshot Now, import the necessary modules as shown − import pyscreenshot as ImageGrab Use the following line of code to get the screenshot − image = ImageGrab.grab() Use the following line of code to save the screenshot to the given location − image.save(”d:/image123.png”) Now, if you want to pop up the screenshot as a graph, you can use the following Python code − import numpy as np import matplotlib.pyplot as plt import pyscreenshot as ImageGrab imageg = ImageGrab.grab() plt.imshow(image, cmap=”gray”, interpolation=”bilinear”) plt.show() Print Page Previous Next Advertisements ”;

Introduction

Python Digital Forensics – Introduction ”; Previous Next This chapter will give you an introduction to what digital forensics is all about, and its historical review. You will also understand where you can apply digital forensics in real life and its limitations. What is Digital Forensics? Digital forensics may be defined as the branch of forensic science that analyzes, examines, identifies and recovers the digital evidences residing on electronic devices. It is commonly used for criminal law and private investigations. For example, you can rely on digital forensics extract evidences in case somebody steals some data on an electronic device. Brief Historical Review of Digital Forensics The history of computer crimes and the historical review of digital forensics is explained in this section as given below − 1970s-1980s: First Computer Crime Prior to this decade, no computer crime has been recognized. However, if it is supposed to happen, the then existing laws dealt with them. Later, in 1978 the first computer crime was recognized in Florida Computer Crime Act, which included legislation against unauthorized modification or deletion of data on a computer system. But over the time, due to the advancement of technology, the range of computer crimes being committed also increased. To deal with crimes related to copyright, privacy and child pornography, various other laws were passed. 1980s-1990s: Development Decade This decade was the development decade for digital forensics, all because of the first ever investigation (1986) in which Cliff Stoll tracked the hacker named Markus Hess. During this period, two kind of digital forensics disciplines developed – first was with the help of ad-hoc tools and techniques developed by practitioners who took it as a hobby, while the second being developed by scientific community. In 1992, the term “Computer Forensics”was used in academic literature. 2000s-2010s: Decade of Standardization After the development of digital forensics to a certain level, there was a need of making some specific standards that can be followed while performing investigations. Accordingly, various scientific agencies and bodies have published guidelines for digital forensics. In 2002, Scientific Working Group on Digital Evidence (SWGDE) published a paper named “Best practices for Computer Forensics”. Another feather in the cap was a European led international treaty namely “The Convention on Cybercrime” was signed by 43 nations and ratified by 16 nations. Even after such standards, still there is a need to resolve some issues which has been identified by researchers. Process of Digital Forensics Since first ever computer crime in 1978, there is a huge increment in digital criminal activities. Due to this increment, there is a need for structured manner to deal with them. In 1984, a formalized process has been introduced and after that a great number of new and improved computer forensics investigation processes have been developed. A computer forensics investigation process involves three major phases as explained below − Phase 1: Acquisition or Imaging of Exhibits The first phase of digital forensics involves saving the state of the digital system so that it can be analyzed later. It is very much similar to taking photographs, blood samples etc. from a crime scene. For example, it involves capturing an image of allocated and unallocated areas of a hard disk or RAM. Phase 2: Analysis The input of this phase is the data acquired in the acquisition phase. Here, this data was examined to identify evidences. This phase gives three kinds of evidences as follows − Inculpatory evidences − These evidences support a given history. Exculpatory evidences − These evidences contradict a given history. Evidence of tampering − These evidences show that the system was tempered to avoid identification. It includes examining the files and directory content for recovering the deleted files. Phase 3: Presentation or Reporting As the name suggests, this phase presents the conclusion and corresponding evidences from the investigation. Applications of Digital Forensics Digital forensics deals with gathering, analyzing and preserving the evidences that are contained in any digital device. The use of digital forensics depends on the application. As mentioned earlier, it is used mainly in the following two applications − Criminal Law In criminal law, the evidence is collected to support or oppose a hypothesis in the court. Forensics procedures are very much similar to those used in criminal investigations but with different legal requirements and limitations. Private Investigation Mainly corporate world uses digital forensics for private investigation. It is used when companies are suspicious that employees may be performing an illegal activity on their computers that is against company policy. Digital forensics provides one of the best routes for company or person to take when investigating someone for digital misconduct. Branches of Digital Forensics The digital crime is not restricted to computers alone, however hackers and criminals are using small digital devices such as tablets, smart-phones etc. at a very large scale too. Some of the devices have volatile memory, while others have non-volatile memory. Hence depending upon type of devices, digital forensics has the following branches − Computer Forensics This branch of digital forensics deals with computers, embedded systems and static memories such as USB drives. Wide range of information from logs to actual files on drive can be investigated in computer forensics. Mobile Forensics This deals with investigation of data from mobile devices. This branch is different from computer forensics in the sense that mobile devices have an inbuilt communication system which is useful for providing useful information related to location. Network Forensics This deals with the monitoring and analysis of computer network traffic, both local and WAN(wide area network) for the purposes of information gathering, evidence collection, or intrusion detection. Database Forensics This branch of digital forensics deals with forensics study of databases and their metadata. Skills Required for Digital Forensics Investigation Digital forensics examiners help to track hackers, recover stolen data, follow computer attacks back to their source, and aid in other types of investigations involving computers. Some of the key skills required to become digital forensics examiner as discussed below − Outstanding

Python Digital Forensics – Home

Python Digital Forensics Tutorial PDF Version Quick Guide Resources Job Search Discussion Digital forensics is the branch of forensic science that analyzes, examines, identifies as well as recovers the digital evidences from electronic devices. It is commonly used in criminal law and private investigation. This tutorial will make you comfortable with performing Digital Forensics in Python on Windows operated digital devices. In this tutorial, you will learn various concepts and coding for carrying out digital forensics in Python. Audience This tutorial will be useful for graduates, post graduates, and research students who either have an interest in this subject or have this subject as a part of their curriculum. Any reader who is enthusiastic about gaining knowledge digital forensics using Python programming language can also pick up this tutorial. Prerequisites This tutorial is designed by making an assumption that the reader has a basic knowledge about operating system and computer networks. You are expected to have a basic knowledge of Python programming. If you are novice to any of these subjects or concepts, we strongly suggest you go through tutorials based on these, before you start your journey with this tutorial. Print Page Previous Next Advertisements ”;

Investigating Embedded Metadata

Investigating Embedded Metadata ”; Previous Next In this chapter, we will learn in detail about investigating embedded metadata using Python digital forensics. Introduction Embedded metadata is the information about data stored in the same file which is having the object described by that data. In other words, it is the information about a digital asset stored in the digital file itself. It is always associated with the file and can never be separated. In case of digital forensics, we cannot extract all the information about a particular file. On the other side, embedded metadata can provide us information critical to the investigation. For example, a text file’s metadata may contain information about the author, its length, written date and even a short summary about that document. A digital image may include the metadata such as the length of the image, the shutter speed etc. Artifacts Containing Metadata Attributes and their Extraction In this section, we will learn about various artifacts containing metadata attributes and their extraction process using Python. Audio and Video These are the two very common artifacts which have the embedded metadata. This metadata can be extracted for the purpose of investigation. You can use the following Python script to extract common attributes or metadata from audio or MP3 file and a video or a MP4 file. Note that for this script, we need to install a third party python library named mutagen which allows us to extract metadata from audio and video files. It can be installed with the help of the following command − pip install mutagen Some of the useful libraries we need to import for this Python script are as follows − from __future__ import print_function import argparse import json import mutagen The command line handler will take one argument which represents the path to the MP3 or MP4 files. Then, we will use mutagen.file() method to open a handle to the file as follows − if __name__ == ”__main__”: parser = argparse.ArgumentParser(”Python Metadata Extractor”) parser.add_argument(“AV_FILE”, help=”File to extract metadata from”) args = parser.parse_args() av_file = mutagen.File(args.AV_FILE) file_ext = args.AV_FILE.rsplit(”.”, 1)[-1] if file_ext.lower() == ”mp3”: handle_id3(av_file) elif file_ext.lower() == ”mp4”: handle_mp4(av_file) Now, we need to use two handles, one to extract the data from MP3 and one to extract data from MP4 file. We can define these handles as follows − def handle_id3(id3_file): id3_frames = {”TIT2”: ”Title”, ”TPE1”: ”Artist”, ”TALB”: ”Album”,”TXXX”: ”Custom”, ”TCON”: ”Content Type”, ”TDRL”: ”Date released”,”COMM”: ”Comments”, ”TDRC”: ”Recording Date”} print(“{:15} | {:15} | {:38} | {}”.format(“Frame”, “Description”,”Text”,”Value”)) print(“-” * 85) for frames in id3_file.tags.values(): frame_name = id3_frames.get(frames.FrameID, frames.FrameID) desc = getattr(frames, ”desc”, “N/A”) text = getattr(frames, ”text”, [“N/A”])[0] value = getattr(frames, ”value”, “N/A”) if “date” in frame_name.lower(): text = str(text) print(“{:15} | {:15} | {:38} | {}”.format( frame_name, desc, text, value)) def handle_mp4(mp4_file): cp_sym = u”u00A9″ qt_tag = { cp_sym + ”nam”: ”Title”, cp_sym + ”art”: ”Artist”, cp_sym + ”alb”: ”Album”, cp_sym + ”gen”: ”Genre”, ”cpil”: ”Compilation”, cp_sym + ”day”: ”Creation Date”, ”cnID”: ”Apple Store Content ID”, ”atID”: ”Album Title ID”, ”plID”: ”Playlist ID”, ”geID”: ”Genre ID”, ”pcst”: ”Podcast”, ”purl”: ”Podcast URL”, ”egid”: ”Episode Global ID”, ”cmID”: ”Camera ID”, ”sfID”: ”Apple Store Country”, ”desc”: ”Description”, ”ldes”: ”Long Description”} genre_ids = json.load(open(”apple_genres.json”)) Now, we need to iterate through this MP4 file as follows − print(“{:22} | {}”.format(”Name”, ”Value”)) print(“-” * 40) for name, value in mp4_file.tags.items(): tag_name = qt_tag.get(name, name) if isinstance(value, list): value = “; “.join([str(x) for x in value]) if name == ”geID”: value = “{}: {}”.format( value, genre_ids[str(value)].replace(“|”, ” – “)) print(“{:22} | {}”.format(tag_name, value)) The above script will give us additional information about MP3 as well as MP4 files. Images Images may contain different kind of metadata depending upon its file format. However, most of the images embed GPS information. We can extract this GPS information by using third party Python libraries. You can use the following Python script can be used to do the same − First, download third party python library named Python Imaging Library (PIL) as follows − pip install pillow This will help us to extract metadata from images. We can also write the GPS details embedded in images to KML file, but for this we need to download third party Python library named simplekml as follows − pip install simplekml In this script, first we need to import the following libraries − from __future__ import print_function import argparse from PIL import Image from PIL.ExifTags import TAGS import simplekml import sys Now, the command line handler will accept one positional argument which basically represents the file path of the photos. parser = argparse.ArgumentParser(”Metadata from images”) parser.add_argument(”PICTURE_FILE”, help = “Path to picture”) args = parser.parse_args() Now, we need to specify the URLs that will populate the coordinate information. The URLs are gmaps and open_maps. We also need a function to convert the degree minute seconds (DMS) tuple coordinate, provided by PIL library, into decimal. It can be done as follows − gmaps = “https://www.google.com/maps?q={},{}” open_maps = “http://www.openstreetmap.org/?mlat={}&mlon={}” def process_coords(coord): coord_deg = 0 for count, values in enumerate(coord): coord_deg += (float(values[0]) / values[1]) / 60**count return coord_deg Now, we will use image.open() function to open the file as PIL object. img_file = Image.open(args.PICTURE_FILE) exif_data = img_file._getexif() if exif_data is None: print(“No EXIF data found”) sys.exit() for name, value in exif_data.items(): gps_tag = TAGS.get(name, name) if gps_tag is not ”GPSInfo”: continue After finding the GPSInfo tag, we will store the GPS reference and process the coordinates with the process_coords() method. lat_ref = value[1] == u”N” lat = process_coords(value[2]) if not lat_ref: lat = lat * -1 lon_ref = value[3] == u”E” lon = process_coords(value[4]) if not lon_ref: lon = lon * -1 Now, initiate kml object from simplekml library as follows − kml = simplekml.Kml() kml.newpoint(name = args.PICTURE_FILE, coords = [(lon, lat)]) kml.save(args.PICTURE_FILE + “.kml”) We can now print the coordinates from processed information as follows − print(“GPS Coordinates: {}, {}”.format(lat, lon)) print(“Google Maps URL: {}”.format(gmaps.format(lat, lon))) print(“OpenStreetMap URL: {}”.format(open_maps.format(lat, lon))) print(“KML File {} created”.format(args.PICTURE_FILE + “.kml”)) PDF Documents PDF documents have a