Skip to content

Notepad++ Remove Duplicates, Blank Lines and Sort Data in One Operation

This post is part 1 of 5 in the series Notepad++ Tips and Tricks
Logo for blog post "Remove Duplicates, Remove Blank Lines, and Sort Data in Notepad++"

You can use the 32-bit version of Notepad++ with the TextFX plugin to quickly remove duplicates, remove blank lines, and sort data – in one operation! This is a fast and easy way to get the results you want in just a few seconds. And as with any kind of automation: the more data you work with, the more time you save :)

Animation showing how to remove duplicates, remove blank lines, and sorting data in Notepad++

Remove Duplicates, Remove Blank Lines and Sort Data in One Operation

In this example, we have a list of data types used in a SQL Server table. We want to find all the unique data types used, and also sort them alphabetically.

1. Make sure that you have the Sort outputs only UNIQUE (at column) lines option enabled:

Screenshot of enabling option "Sort outputs only UNIQUE (at column) lines"
Click TextFX → TextFX Tools → Enable Sort outputs only UNIQUE (at column) lines

2. In the Notepad++ window, paste the text that you want to remove duplicates and blank lines from. In this example, we have 500 lines, and half of them are blank:

Screenshot showing 500 lines before removing duplicates

3. Select all the text and use either Sort lines case sensitive (at column) or Sort lines case insensitive (at column):

Screenshot of clicking option "Sort lines case insensitive (at column)"
Select the text (CTRL+A) and click TextFX → TextFX Tools → Sort lines case insensitive (at column)

4. Tadaaa! We have now removed duplicates and blank lines, and the data has been sorted alphabetically. In this example, we ended up with only 15 lines:

Screenshot showing 15 lines after removing duplicates and blank lines

Summary

In Notepad++, you can use the TextFX plugin to quickly remove duplicates, remove blank lines, and sort data – in one operation! First, make sure that you have the Sort outputs only UNIQUE option enabled. Then, use the Sort lines case [in]sensitive feature.

It’s as simple as that :) This is a very handy feature, but it is somewhat hidden away in the menu. Once you find it, however, it can save you a lot of time!

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)

Comments

Hi! This is Cathrine. Thank you so much for visiting my blog. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

This is great! I have been using Notepad++ for years, but I never noticed this feature before. Thank you!

Seconding Mike :)

thanks for the tip but it doesn’t remove duplicates

Are you certain the option “+Sort outputs only UNIQUE (at column) lines” was checked when you tried? If so, are you certain the lines are actual duplicates? A word with a space at the end and a word without the space at the end are not counted as duplicate lines.

The x64 version doesn’t have the plug-in manager.

Hi Cathrine

Nice post to find duplicate lines on NP++

I want (and don’t know how) to go further looking by line but on unique columns in order to have a unique ID field

example:

I have
date, country, name, email

I want to look on Column email only, considering that the same person can logging in different dates, or even use a different name, and the lines are then different so this solution doesn’t work for me. Can you please help?

thanks in advance
Daniel

Hi Daniel. Unfortunately I don’t know how to achieve this in Notepad++ right now, but it should be easy in a spreadsheet application or a database. Perhaps someone else knows how to do this? I’m interested in a solution myself :)

For removing duplicates on one matching column just get a copy of Excel 2010, there’s a “Remove Duplicates” function where you can choose the columns to match.

No idea how to do that with NP++ though.

@Cathrine, thank you! TextFX is a really nifty tool, wow!

@Daniel:

> I want to look on Column email only, considering that the same person can logging in different dates,
> or even use a different name, and the lines are then different so this solution doesn’t work for me. Can you please help?

You can try using regular expressions https://www.cloudinsidr.com/content/tip-of-the-day-how-to-extract-domain-names-from-email-addresses-using-regular-expressions-regex/. Daniel, could you perhaps post an example of the pattern that you need to address here?

Thank you, Cathrine!

You need to click on ‘Sort line case sensitives…..’ after checking both[sorts] are checked, in order to remove and sort.

Thanks. Worked perfectly.

Thanks! This is what I was looking for :).

excellent tip, notepad++ did faster job than excel. Awesome feature.

You saved me big time, Thank you Catherine.

Hi,

I followed the same steps as mentioned in post with option “+Sort outputs only UNIQUE (at column) lines” It is only sorting but not deleting duplicates. Is this working for anybody?

hi, guys i am also a notepad user i find this technique very useful but i still dont understand in which version i found this Text fx option :(

Hi Louie. Try to open Plugins → Plugin Manager → Show Plugin Manager. Locate TextFX Characters and install it. You should now get the TextFX menu in Notepad++.

TextFX tools > short lines case sensitive (at column) will be alphabetic lines.
TextFX is a plugin fot notepad++ . you can download “Plugin manager”

Hi! It Works excellent for me, and it saved me a lot of time!!

Thanks!

It just sort out duplicate lines, but can’t remove it automatically. If i have 2000 line… make me :(

excellent tip – Thanks!
it saved me a lot of time!!
Thanks

Just find and replace with nothing, regular expression:

^(.*?)$s+?^(?=.*^1$)

(remember to match . as new line)

That dont change order of rows.

Catherine, I am unable to remove extra spaces within a line. I tried everything I can think of.

Hi Sandra, have you tried using TextFX → TextFX Edit → Kill unquoted whitespace? You could also try a regular expression: Use Replace and select Regular Expression in the Search Mode. Type in ” +” (a space and a plus sign, don’t include the quotation marks) in the Find What field, and a ” ” (a space, don’t include the quotation marks) in the Replace With field. You can select just a line and check the In Selection box if you only need to do this operation on a single line. Does that help?

thanks a lot. good

Great tip about removing duplicates!

Really awesome! Thanks!

Hi guys i have more then 300000 lines codes but these can’t delete duplicates lines automatically . can u help me ?

Hi Ghost. Have you checked the “Sort outputs only UNIQUE” option, then selected all the text in the file, and then run “Sort lines”? I have successfully deleted duplicate lines in files with several million rows this way.

Perfecto!! Muchas gracias

Thanks a lot!!

NeO83666
Andres

He Really Superb…. Thanks for sharing it ..
I shared with my friends too….

Brilliant !! Thank you very much for this. We used to use a proprietary program to remove duplicates. This is easier and faster.

Very nice little trick. I’ve been using Notepad++ for quite a while, but didn’t know this simple technique. Furthermore, the program is lightning fast, so it can handle the orders quite fast.

Pingback: Notepad++ remove duplicates, remove blank lines and sort data in one operation | Christian Fleischhacker

Thanks for the steps! Just what I was looking for!

Woow it works Catherine!!! Thanks for sharing, it really saved lot of time. Me and my hubby was trying to remove duplicate for around 39+ lakh data and this option helped us saving our time as we were struggling with excel because of its limitations…..

Thanks for sharing

Hi, I tried installing this plugin via Plugin Manager, and after rebooting Notepad it doesn’t appear as having been installed.
I have tried installing it several times but it hasn’t worked, the package doesn’t get installed.
Note that the install process doesn’t throw an error.
Would be great if you could help with this.

Hi Dan, this is is something you have to ask Notepad++ support about, and not me :)

I did, no answer yet. Though I would give it a try here. Thanks anyway!

thank you very much

Ta very muchly.

Thanks very much! Great tip :)

Pingback: Sort and remove duplicates – supreme sysops

thank you very much. it work great.

Thank you…

is there a way to remove duplicate without sorting?

Hi Ben, I just found this answer and it requires no extra plugins, or sorting. You’ll be amazed as it’s done through REGEX. https://stackoverflow.com/a/16293580/1190051

It’s been some years since I’ve used NotePad++. It’s a great programming editor, but I haven’t used it in years. I really appreciate your articles and help familiarizing me with the programing and utilization of these special features.

Excellent! Thanks a lot.

I want to remove dupliactes from some texts included with delimiters in Notepad++,Can you please suggest me the ways to do that in a single click ?

Since Notepad++ Version 6 you can use this regex in the search and replace dialogue:

^(.*?)$\s+?^(?=.*^\1$)
and replace with nothing. This leaves from all duplicate rows the last occurrence in the file.

No sorting is needed for that and the duplicate rows can be anywhere in the file!

You need to check the options “Regular expression” and “. matches newline”:

https://i.imgur.com/dY3LCMD.png

^ matches the start of the line.

(.*?) matches any characters 0 or more times, but as few as possible (It matches exactly on row, this is needed because of the “. matches newline” option). The matched row is stored, because of the brackets around and accessible using \1

$ matches the end of the line.

\s+?^ this part matches all whitespace characters (newlines!) till the start of the next row ==> This removes the newlines after the matchd row, so that no empty row is there after the replacement.

(?=.*^\1$) this is a positive lookahead assertion. This is the important part in this regex, a row is only matched (and removed), when there is exactly the same row following somewhere else in the file.

This is a handy tool to have in the toolbox. TY.

“Paste the text that you want to remove duplicates and blank lines fr” Paste it where???

Paste your text into the Notepad++ window.

Thanks for giving this information. It is very helpful information.

Hi! This is Cathrine (again). Just a reminder. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

Leave a Reply to Agus Salim Cancel reply