Notepad++ remove duplicates, remove blank lines and sort data in one operation

This post is part 1 of 5 in the series Efficient work using Notepad++

Notepad++ remove duplicates, remove blank lines and sort data in one operationNotepad++ with the TextFX plugin makes it quick and easy to remove duplicates, remove blank lines and sort data in one operation. This is a typical case where Excel does the job, but as I always have Notepad++ open it is fast and easy to get the results I want in just a few seconds.

Remove duplicates, remove blank lines and sort data in one operation

Paste text into Notepad++
1. Paste the text into Notepad++ (CTRL+V). As you can see, there were 473 lines and half of them were blank.

Mark all the text and click Sort outputs only UNIQUE
2. Mark all the text (CTRL+A). Click TextFX → Click TextFX Tools → Check +Sort outputs only UNIQUE (at column) lines (if not already checked).

Click Sort lines case insensitive
3. Click TextFX → Click TextFX Tools → Click Sort lines case insensitive (at column)

Remove duplicates and sort data alphabeticallt
4. Duplicates and blank lines have been removed and the data has been sorted alphabetically. (The first line that may appear empty contains a space, which is regarded as a character and is included in the list of unique data.)

This is a handy feature somewhat hidden away in the menu, and it has saved me a lot of time already.

Who is Cathrine Wilhelmsen?

Cathrine is a Microsoft Data Platform MVP, BimlHero, speaker, blogger and chronic volunteer. She currently works as a Community Evangelist for PASS and coordinates all SQLSaturdays around the world, but has previously worked as a SQL Server Data Warehouse architect and Business Intelligence developer. She loves sci-fi, chocolate, cat gifs and smilies :)

57 thoughts on “Notepad++ remove duplicates, remove blank lines and sort data in one operation”

This is great! I have been using Notepad++ for years, but I never noticed this feature before. Thank you!

Seconding Mike :)

TextFX is not compatible with current version of Notepad++

I have Notepad++ version 6.1.7 and TextFX is compatible with it.

Hi Cathrine, thank you for this helpful post!
Notepad 6.8.6 on Windows 10 Pro here. I can confirm that the plug-in is compatible (even though it sorted the lines but failed to remove duplicates). Still, this is huge progress.

thanks for the tip but it doesn’t remove duplicates

Are you certain the option “+Sort outputs only UNIQUE (at column) lines” was checked when you tried? If so, are you certain the lines are actual duplicates? A word with a space at the end and a word without the space at the end are not counted as duplicate lines.

Hay Catherine i also have 6.1.7 version but i dont have this Text fx option in my notepad so how you get this option.. Reply me soon via mail.

Hi Louie. Try to open Plugins → Plugin Manager → Show Plugin Manager. Locate TextFX Characters and install it. You should now get the TextFX menu in Notepad++.

Hi Cathrine

Nice post to find duplicate lines on NP++

I want (and don’t know how) to go further looking by line but on unique columns in order to have a unique ID field

example:

I have
date, country, name, email

I want to look on Column email only, considering that the same person can logging in different dates, or even use a different name, and the lines are then different so this solution doesn’t work for me. Can you please help?

thanks in advance
Daniel

Hi Daniel. Unfortunately I don’t know how to achieve this in Notepad++ right now, but it should be easy in a spreadsheet application or a database. Perhaps someone else knows how to do this? I’m interested in a solution myself :)

For removing duplicates on one matching column just get a copy of Excel 2010, there’s a “Remove Duplicates” function where you can choose the columns to match.

No idea how to do that with NP++ though.

@Cathrine, thank you! TextFX is a really nifty tool, wow!

@Daniel:

> I want to look on Column email only, considering that the same person can logging in different dates,
> or even use a different name, and the lines are then different so this solution doesn’t work for me. Can you please help?

You can try using regular expressions (https://goo.gl/004RHF). Daniel, could you perhaps post an example of the pattern that you need to address here?

Thank you, Cathrine!

You need to click on ‘Sort line case sensitives…..’ after checking both[sorts] are checked, in order to remove and sort.

Thanks. Worked perfectly.

Thanks! This is what I was looking for :).

excellent tip, notepad++ did faster job than excel. Awesome feature.

You saved me big time, Thank you Catherine.

Hi,

I followed the same steps as mentioned in post with option “+Sort outputs only UNIQUE (at column) lines” It is only sorting but not deleting duplicates. Is this working for anybody?

hi, guys i am also a notepad user i find this technique very useful but i still dont understand in which version i found this Text fx option :(

Hi Louie. Try to open Plugins → Plugin Manager → Show Plugin Manager. Locate TextFX Characters and install it. You should now get the TextFX menu in Notepad++.

TextFX tools > short lines case sensitive (at column) will be alphabetic lines.
TextFX is a plugin fot notepad++ . you can download “Plugin manager”

Hi! It Works excellent for me, and it saved me a lot of time!!

Thanks!

It just sort out duplicate lines, but can’t remove it automatically. If i have 2000 line… make me :(

excellent tip – Thanks!
it saved me a lot of time!!
Thanks

Just find and replace with nothing, regular expression:

^(.*?)$\s+?^(?=.*^\1$)

(remember to match . as new line)

That dont change order of rows.

HI Cathrine..if i have 300 email address and want to remove duplicate email..how would i do it.

Catherine, I am unable to remove extra spaces within a line. I tried everything I can think of.

Hi Sandra, have you tried using TextFX → TextFX Edit → Kill unquoted whitespace? You could also try a regular expression: Use Replace and select Regular Expression in the Search Mode. Type in ” +” (a space and a plus sign, don’t include the quotation marks) in the Find What field, and a ” ” (a space, don’t include the quotation marks) in the Replace With field. You can select just a line and check the In Selection box if you only need to do this operation on a single line. Does that help?

thanks a lot. good

Great tip about removing duplicates!

Really awesome! Thanks!

Pingback: URL List Editor Needed

Hi guys i have more then 300000 lines codes but these can’t delete duplicates lines automatically . can u help me ?

Hi Ghost. Have you checked the “Sort outputs only UNIQUE” option, then selected all the text in the file, and then run “Sort lines”? I have successfully deleted duplicate lines in files with several million rows this way.

Perfecto!! Muchas gracias

Thanks a lot!!

NeO83666
Andres

reading this has helped a lot but I cannot find a way to separate the output alpabeticaly with titles such as adding a couple of blank lines to separate into alpha segments with each segment labelled such a “— A —” then a 3 line break for “B” showing “— B –“. I I I would like the headers to be input 10 spaces. The list has nothing but a to z as starting characters for each line. I can do this with swichng frok one tool to another but Notepad ++ can probably do it all

He Really Superb…. Thanks for sharing it ..
I shared with my friends too….

Brilliant !! Thank you very much for this. We used to use a proprietary program to remove duplicates. This is easier and faster.

Very nice little trick. I’ve been using Notepad++ for quite a while, but didn’t know this simple technique. Furthermore, the program is lightning fast, so it can handle the orders quite fast.

Pingback: Notepad++ remove duplicates, remove blank lines and sort data in one operation | Christian Fleischhacker

Thanks for the steps! Just what I was looking for!

Hello,

I have a file which has 127900 rows. I am trying to identify the duplicates values in a specific column. How do I do that using Notepadd++

Woow it works Catherine!!! Thanks for sharing, it really saved lot of time. Me and my hubby was trying to remove duplicate for around 39+ lakh data and this option helped us saving our time as we were struggling with excel because of its limitations…..

Thanks for sharing

Hi, I tried installing this plugin via Plugin Manager, and after rebooting Notepad it doesn’t appear as having been installed.
I have tried installing it several times but it hasn’t worked, the package doesn’t get installed.
Note that the install process doesn’t throw an error.
Would be great if you could help with this.

Hi Dan, this is is something you have to ask Notepad++ support about, and not me :)

I did, no answer yet. Though I would give it a try here. Thanks anyway!

thank you very much

Ta very muchly.

Thanks very much! Great tip :)

Pingback: Sort and remove duplicates – supreme sysops

Is there away to remove duplicated lines without textfx ?!

I’m on 7.3V, and i dont see TextFx, niether i see Plugin Manager under Plugins

how to delete duplicate lines without sorting ! ?

thank you very much. it work great.

Thank you…

Share Your Thoughts?