How to clean data in Excel? – eternal problem and not only in Excel! Data cleansing is a key question either we create a simple analysis. In today’s article we will go through the most important methods. String manipulations in Excel are very important! Besides the well-known methods we will introduce special VBA codes that will make the process quicker and more effective. Furthermore, in the last section of article we’ll introduce our free excel add-in.
Effective ways to clean data in Excel
There are many data cleansing methods, although we often forget about this obvious method. Let’s see the basics first!
Sorting data to check extremes and peaks
It’s possible to find errors in a list using a simple sorting? Yes! We have nothing else to do but arrange the data either growing or decreasing order. Right-click on the cell and choose ‘Sort Largest To Smallest’ option from the menu. Seldom there can be find extremely small or large, maybe peeking data.
Look at the picture below:
Who would ever thought that these kinds of mistakes can be found in a several million-line data? Another great reason that data cleansing can never be forgotten about.
Remove duplicates in Excel
Excel supports many options to eliminate duplicates. The filtering of unique values and the removal of duplicated data are two tightly joined operation. The result is the same in both cases: a list of unique values. To filter for unique values (filtered list), use the Advanced command in the Sort & Filter group on the Data tab.
Select the range of cells! Then go to the Data tab! In the Sort & Filter group, click Advanced. Choose the Unique records only check box and click OK.
Explanation: However, there is one major difference that we cannot ignore. In the process of filtering unique values, it temporarily hides the duplicated values. The keyword is: temporarily! You can back your original list to undo this operation.
If you are using the ‘Remove Duplicates’ tool on the ribbon, you’ll delete duplicates permanently! Pay attention to this! The removed, deleted items will be last after saving the workbook. So be very careful.
Let us see the how the duplicate removal values tool works! First select the range. Go to the Data tab! In the Data Tools group, click Remove Duplicates.
Select one or more columns then click OK.
A popup window is displayed show how many duplicate values were removed and how many unique values remain.
Use the find and replace function
The advantage of the Find & Replace function is that we can work with it relatively fast in any size data table. Let’s see how to clean data using this function. To find and replace data in a worksheet, follow these steps:
Press Ctrl+H and the Find and Replace dialog box appears.
In the Find What box, enter the data you want to locate. In the Replace With box, enter the data with which you want to replace the found data.
To replace all occurrences at the same time, click Replace All.
Check the Type of Data in a Cell
Wait a minute before we do any changes to the raw data. We have to check the type of data is in the cell. We can do this by using the TYPE function. The function is one of a group of information functions. Numeric or text data? On the picture below you can see what kind of results the formula brings in case of different kinds of data.
If the function return with 1, the data type is numeric. If it’s return with 2, the data type is text.
Convert Numbers Stored as Text into Numbers
If we are importing data from a text file or from an external data source into Excel an often occurrence that the numbers being stored as text. This can be the source of many problems but there is ways to avoid mistakes. Follow the next description step by step!
Select a blank cell then type 1. Press Control + C to select the text range which we would like to convert to numbers.
Right click and choose Paste and Paste Special. In the dialogue box select Multiply.
Eliminate blank cells in a list or range
Now we’ll highlight the whole range that also contains the empty lines. Press the F5 button and from the popup window choose the special option.
All the cells in the selection that are not blank are deselected, leaving only the blank cells selected.
In the ‘Cells’ section of the ‘Home’ tab, click ‘Delete’.
Finally select ‘Delete Sheet Rows’ from the drop-down menu.
Clean Data to Split text into columns
There is a possibility for the text of one or many cells to divide into several cells. When can this be useful? Let’s suppose that our data set contains our clients’ first and family names in one cell. We would like to split the names that the first names and family names be in different columns.
Start the ‘Convert Text to Columns Wizard’ and follow the steps below:
Select the range which contains names.
Click Text to Columns in the Data Tools group.
Select the Delimiters checkbox and click Next.
Leave all the check boxes empty under Delimiters section. Now check the Space check box (in our example)
Clean Data using Built-in Excel Functions
Concatenate text using TEXTJOIN function
And finally let’s see the opposite of the previous section. Now we would like to join the text from several cells into one. How can we solve this problem quickly?
The TEXTJOIN function joins (what a surprise) values using a selected delimiter. It’s provide better functionality than the CONCAT. This function allows you to supply a range of cells, and has a setting to ignore empty values.
Check the formula bar! You can use space as delimiter. The second parameter is TRUE because we would like to ignore the empty cells.
If you are using earlier version of Excel you should to install 3rd party add-in.
Change Text to Lower – Upper – Proper Case
Let’s see what kind of Excel text transformation formulas we can use for the transformation of the text. It is worth to use them during data cleansing.
The LOWER() formula transforms the given text into lower case letters. Example: LOWER(‘EXCEL’), Result: ‘excel’
The UPPER() formula transforms the given text into upper case letters. Example: UPPER(‘excel’), Result: ‘EXCEL’
The PROPER() formula transforms the first letter of the character string, and the letters standing after none-letters into upper case letters. And transforms all the other letter into lower case letters. Example: PROPER(‘Information is BEAUTIFUL’)
The result of the conversion is: ‘Information Is Beautiful’. Not too often but we might use these kinds of transformations also.
Remove non-printable characters – The CLEAN formula
This is one of the most special field of data cleaning! In general, we can say that work is relatively difficult with the use of special text formulas. We have made some short VBA codes for several possibilities.
To remove non-printing characters use the CLEAN() formula!
Let’s see how it works with the help of the following example! You can check the result using LEN() formula.
Remove unnecessary characters from text using macros
If we only want to keep special characters of a string, the following macro enabled workbook can help. Download this Workbook to remove alpha, numeric or non-numeric characters in seconds.
It’s easy to clean data with these smart vba codes. Go to the developer tab and click Macros.
Clean extra spaces using TRIM function
Removing extra spaces from the text is a task that needs more caution. Spaces can be at the beginning, in the middle or at the end of the text. Hardest to recognize are the letter ones because they are invisible for the naked eye.
We’ll explain the two reasons they have to be removed. First that they can cause mistakes at the use of formulas. Imagine for example that the result of the VLOOKUP() formula is incorrect. It may sound funny, but it is not. Some formula mistakes may influence the outcome of the final calculation.
The other reason is the size of the file. Although it is true that one-character surplus is not baneful. Problems occur when the Excel data base contains several million records.
Yes, there are some like this, just think about the external data sources and Power Query output. In this case the size increase can even be 5% – 10%. We don’t have to depict the negative effects of this….
Use the TRIM() formula and guaranteed that the list will be clean.
Automated Data Cleansing in Excel – Free Add-in
Before we take a deep-dive into this topic, allow us to tell a little story. We would like to summon an old memory from those times when I didn’t even know what Excel was. After college studies I got employed at a small company with limited HR resources.
As a beginner they didn’t dish out the premium tasks for me. Clean data in Excel… I got the seemingly very boring tasks.
This wouldn’t have been a problem in its own. Perfect execution versus deadlines? I have never heard of the Excel string manipulation function, so the solution wasn’t simple. These data cleansing related tasks consumed many night shifts.
We have developed a simple and lightweight VBA application especially for data cleaning.
You can reach this tool from the ribbon.
We would like that the text manipulation and Excel data processing would be a child’s play for you. Today we publish a universal Excel add-in tool. From now string manipulation and quick data cleansing will be very simple!
Clean Data using Text Functions in Excel – Change Case
The first function is widely used with Excel text styled data. We can also use the classical Excel formulas but there are cases when this is just not enough. On the following picture you can see what kind of Excel string manipulation functions we have built in.
As a first step, highlight the range that you would like to apply the text transformation. After that start the program from the ribbon! From the list choose the Change Case element and select your preferred operation.
I think these functions are not requires further explanations. Apply the selected function on the highlighted data range and the task is done.
Insert text after specific nth characters
With the use of some Excel functions we can resolve that we insert some arbitrary characters in front of or behind a given string.
The problem starts when we have to perform this operation on an Excel database. By choosing the ‘Add Text’ option we can manipulate a huge data sets in a matter of seconds! Execution this task manually is the real mission impossible case using simple Excel formulas.
From the Excel string manipulation formulas we have to emphasize the SEARCH(), FIND(), MID(), LEFT(), RIGHT() functions.
Now imagine that comfortable situation that you don’t have to create endless formulas ever again! Simply start the functions can be seen on the picture.
Delete text after nth character
Out of the so far introduced text transformations some can be done with the use of previously described formulas. The following will be a real treat for the professionals working in the field of business. Let’s stay a little more with the string manipulation functions. Opposite to the previous paragraph here we don’t insert the characters but remove them.
OK, at this point all people who good at Excel would say the following sentence: Let’s cut the characters from the front and back.
And what if we have to do this from the middle of the text or starting from the fifth character? Probably this is the moment when the average user would throw the keyboard from the desk…
Please try the remove by position function; it is a very effective working tool. Look through the list and you blood pressure will quickly be restored.
Remove spaces to clean data
The next thing to do is to remove the spaces from the text. This is not complicated, simply can be done by using the Excel TRIM function.
In this case we’ve done nothing else then automatized this function. You don’t need to apply any formulas, just use the Remove Spaces and All Spaces command.
Remove special characters from text
Finally let’s see the special characters. We distinguish three different types. The first on is the non-printable characters, the second is the alpha characters and the third one is the numerical data. The tools can be seen on the picture are capable to remove only one type of characters from the text.
Imagine how useful can be this Excel application if we have to cleanse a database containing phone numbers. The cells have numbers and other characters also.
Let’s say we need data that only can contain numbers. And the list of phone numbers five thousand rows. What do you think how long would it take to remove the hyphens and other non-numerical characters? It takes too long only thinking about it!
Use Excel and automated data cleaning functions! The execution time narrows down to seconds. It’s great!
Final Thoughts: Use Effective Data Cleansing Tools
Let’s summarize what have we learned today! We have introduced the most important text transformations and their field of use. The free data cleansing add-in will take a lot of weight off your shoulders. In case you would like to increase your knowledge of Excel visit the well-known Excel forums.
There is no shame in learning from the professionals. That’s what we have done in the early days. The Excel string manipulation function is a very large subject that needs time and attention. As we can see from the article Excel offers many possibilities to improve data quality. Organize your data before create an excel dashboard!
Some functions we made with VBA macros, you should check these out closely. We hope that with the help of the introduced data cleansing techniques you will create flawless analysis.
Download Text Tools add-in for free
We have a goal to make a complex add-in for you that will save you a lot of time. We have named it Excel productivity tool. Presently we are at the final stages of development.
We already know what kind of modules will be in the program. Our plan is to implement even more functions to the add-in in the future. We are working on it!