Cleaning Data with Text Functions ”; Previous Next The data that you obtain from different sources many not be in a form ready for analysis. In this chapter, you will understand how to prepare your data that is in the form of text for analysis. Initially, you need to clean the data. Data cleaning includes removing unwanted characters from text. Next, you need to structure the data in the form you require for further analysis. You can do the same by − Finding required text patterns with the text functions. Extracting data values from text. Formatting data with text functions. Executing data operations with the text functions. Removing Unwanted Characters from Text When you import data from another application, it can have nonprintable characters and/or excess spaces. The excess spaces can be − leading spaces, and/or extra spaces between words. If you sort or analyze such data, you will get erroneous results. Consider the following example − This is the raw data that you have obtained on product information containing the Product ID, Product description and the price. The character “|” separates the field in each row. When you import this data into Excel worksheet, it looks as follows − As you observe, the entire data is in a single column. You need to structure this data to perform data analysis. However, initially you need to clean the data. You need to remove any nonprintable characters and excess spaces that might be present in the data. You can use the CLEAN function and TRIM function for this purpose. S.No. Function & Description 1. CLEAN Removes all nonprintable characters from text 2. TRIM Removes spaces from text Select the Cells C3 – C11. Type =TRIM (CLEAN (B3)) and then press CTRL + Enter. The formula is filled in the cells C3 – C11. The result will be as shown below − Finding required Text Patterns with the Text Functions To structure your data, you might have to do certain Text Pattern matching based on which you can extract the Data Values. Some of the Text Functions that are useful for this purpose are − S.No. Function & Description 1. EXACT Checks to see if two text values are identical 2. FIND Finds one text value within another (case-sensitive) 3. SEARCH Finds one text value within another (not case-sensitive) Extracting Data Values from Text You need to extract the required data from text in order to structure the same. In the above example, say, you need to place the data in three columns – ProductID, Product_Description and Price. You can extract data in one of the following ways − Extracting Data Values with Convert Text to Columns Wizard Extracting Data Values with Text Functions Extracting Data Values with Flash Fill Extracting Data Values with Convert Text to Columns Wizard You can use the Convert Text to Columns Wizard to extract Data Values into Excel columns if your fields are − Delimited by a character, or Aligned in columns with spaces between each field. In the above example, the fields are delimited by the character “|”. Hence, you can use the Convert Text to Columns wizard. Select the data. Copy and paste values in the same place. Otherwise, Convert Text to Columns takes the functions rather than the data itself as the input. Select the data. Click on Text to Columns in the Data Tools group under Data Tab on the Ribbon. Step 1 − Convert Text to Columns Wizard – Step 1 of 3 appears. Select Delimited. Click Next. Step 2 − Convert Text to Columns Wizard – Step 2 of 3 appears. Under Delimiters, select Other. In the box next to Other, type the character | Click Next. Step 3 − Convert Text to Columns Wizard – Step 3 of 3 appears. In this screen, you can select each column of your data in the wizard and set the format for that column. For Destination, select the cell D3. You can click Advanced, and set Decimal Separator and Thousands Separator in the Advanced Text Import Settings dialog box that appears. Click Finish. Your data, which is converted to columns appears in the three Columns – D, E and F. Name the Column headers as ProductID, Product_Description and Price. Extracting Data Values with Text Functions Suppose the fields in your data neither are delimited by a character nor are aligned in columns with spaces between each field, you can use text functions to extract data values. Even in the case the fields are delimited, you can still use text functions to extract data. Some of the text functions that are useful for this purpose are − S.No. Function & Description 1. LEFT Returns the leftmost characters from a text value 2. RIGHT Returns the rightmost characters from a text value 3. MID Returns a specific number of characters from a text string starting at the position you specify 4. LEN Returns the number of characters in a text string You can also combine two or more of these text functions as per the data you have at hand, to extract the required data values. For example, using a combination of LEFT, RIGHT and VALUE functions or using a combination of FIND, LEFT, LEN and MID functions. In the above example, All the characters left to the first | give the name ProductID. All the characters right to the second | give the name Price. All the characters that lie between the first | and second | give the name Product_Description. Each | has a space before and after. Observing this information, you can extract the data values with the following steps − Find the Position of First | – First | Position You can use FIND function Find the Position of Second | – Second | Position You can use FIND function again Beginning to (First | Position – 2) Characters of the Text give ProductID You can use LEFT Function (First | Position + 2) to (Second