Skip to content

Table Partitioning in SQL Server – The Basics

This post is part 1 of 2 in the series Table Partitioning in SQL Server
SQL Server Table Partitioning Cheat Sheet

There are many benefits of partitioning large tables. You can speed up loading and archiving of data, you can perform maintenance operations on individual partitions instead of the whole table, and you may be able to improve query performance. However, implementing table partitioning is not a trivial task and you need a good understanding of how it works to implement and use it correctly.

Being a business intelligence and data warehouse developer, not a DBA, it took me a while to understand table partitioning. I had to read a lot, get plenty of hands-on experience and make some mistakes along the way. (The illustration to the left is my Table Partitioning Cheat Sheet.) One of my favorite ways to learn something is to figure out how to explain it to others, so I recently did a webinar about table partitioning. I wanted to follow that up with focused blog posts that included answers to questions I received during the webinar. This post covers the basics of partitioned tables, partition columns, partition functions and partition schemes.

What is Table Partitioning?

Partitioned Table

Table partitioning is a way to divide a large table into smaller, more manageable parts without having to create separate tables for each part. Data in a partitioned table is physically stored in groups of rows called partitions and each partition can be accessed and maintained separately. Partitioning is not visible to end users, a partitioned table behaves like one logical table when queried.

This example illustration is used throughout this blog post to explain basic concepts. The table contains data from every day in 2012, 2013, 2014 and 2015, and there is one partition per year. To simplify the example, only the first and last day in each year is shown.

An alternative to partitioned tables (for those who don’t have Enterprise Edition) is to create separate tables for each group of rows, union the tables in a view and then query the view instead of the tables. This is called a partitioned view. (Partitioned views are not covered in this blog post.)

What is a Partition Column?

Partition Column (Partition Key)

Data in a partitioned table is partitioned based on a single column, the partition column, often called the partition key. Only one column can be used as the partition column, but it is possible to use a computed column.

In the example illustration the date column is used as the partition column. SQL Server places rows in the correct partition based on the values in the date column. All rows with dates before or in 2012 are placed in the first partition, all rows with dates in 2013 are placed in the second partition, all rows with dates in 2014 are placed in the third partition, and all rows with dates in 2015 or after are placed in the fourth partition. If the partition column value is NULL, the rows are placed in the first partition.

It is important to select a partition column that is almost always used as a filter in queries. When the partition column is used as a filter in queries, SQL Server can access only the relevant partitions. This is called partition elimination and can greatly improve performance when querying large tables.

What is a Partition Function?

Partition Function

The partition function defines how to partition data based on the partition column. The partition function does not explicitly define the partitions and which rows are placed in each partition. Instead, the partition function specifies boundary values, the points between partitions. The total number of partitions is always the total number of boundary values + 1.

In the example illustration there are three boundary values. The first boundary value is between 2012 and 2013, the second boundary value is between 2013 and 2014, and the third boundary value is between 2014 and 2015. The three boundary values create four partitions. (The first partition also includes all rows with dates before 2012 and the last partition also includes all rows after 2015, but the example is kept simple with only four years for now.)

But what are the actual boundary values used in the example? How do you know which date values are the points between two years? Is it December 31st or January 1st? The answer is that it can actually be either December 31st or January 1st, it depends on whether you use a range left or a range right partition function.

Range Left and Range Right

Partition functions are created as either range left or range right to specify whether the boundary values belong to their left or right partitions:

  • Range left means that the actual boundary value belongs to its left partition, it is the last value in the left partition.
  • Range right means that the actual boundary value belongs to its right partition, it is the first value in the right partition.

Left and right partitions make more sense if the table is rotated:

Partition Function Range Left and Range Right Partition Function Range Left and Range Right Partition Function Range Left and Range Right

Range Left and Range Right using Dates

The first boundary value is between 2012 and 2013. This can be created in two ways, either by specifying a range left partition function with December 31st as the boundary value, or as a range right partition function with January 1st as the boundary value:

Partition Function Range Left and Range Right Partition Function Range Left and Range Right Partition Function Range Left and Range Right

Partition functions are created as either range left or range right, it is not possible to combine both in the same partition function. In a range left partition function, all boundary values are upper boundaries, they are the last values in the partitions. If you partition by year, you use December 31st. If you partition by month, you use January 31st, February 28th / 29th, March 31st, April 30th and so on. In a range right partition function, all boundary values are lower boundaries, they are the first values in the partitions. If you partition by year, you use January 1st. If you partition by month, you use January 1st, February 1st, March 1st, April 1st and so on:

Partition Function Range Left and Range Right Partition Function Range Left and Range Right

Range Left and Range Right using the Wrong Dates

If the wrong dates are used as boundary values, the partitions incorrectly span two time periods:

Partition Function Range Left and Range Right Partition Function Range Left and Range Right

What is a Partition Scheme?

Partition Scheme

The partition scheme maps the logical partitions to physical filegroups. It is possible to map each partition to its own filegroup or all partitions to one filegroup.

A filegroup contains one or more data files that can be spread on one or more disks. Filegroups can be set to read-only, and filegroups can be backed up and restored individually. There are many benefits of mapping each partition to its own filegroup. Less frequently accessed data can be placed on slower disks and more frequently accessed data can be placed on faster disks. Historical, unchanging data can be set to read-only and then be excluded from regular backups. If data needs to be restored it is possible to restore the partitions with the most critical data first.

How do I create a Partitioned Table?

The following script (for SQL Server 2012 and higher) first creates a numbers table function created by Itzik Ben-Gan that is used to insert test data. The script then creates a partition function, a partition scheme and a partitioned table. (It is important to notice that this script is meant to demonstrate the basic concepts of table partitioning, it does not create any indexes or constraints and it maps all partitions to the [PRIMARY] filegroup. This script is not meant to be used in a real-world project.) Finally it inserts test data and shows information about the partitioned table.

/* --------------------------------------------------
-- Create helper function GetNums by Itzik Ben-Gan
-- https://www.itprotoday.com/sql-server/virtual-auxiliary-table-numbers
-- GetNums is used to insert test data
-------------------------------------------------- */

-- Drop helper function if it already exists
IF OBJECT_ID('GetNums') IS NOT NULL
	DROP FUNCTION GetNums;
GO

-- Create helper function
CREATE FUNCTION GetNums(@n AS BIGINT) RETURNS TABLE AS RETURN
  WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
  SELECT TOP (@n) n FROM Nums ORDER BY n;
GO

/* ------------------------------------------------------------
-- Create example Partitioned Table (Heap)
-- The Partition Column is a DATE column
-- The Partition Function is RANGE RIGHT
-- The Partition Scheme maps all partitions to [PRIMARY]
------------------------------------------------------------ */

-- Drop objects if they already exist
IF EXISTS (SELECT * FROM sys.tables WHERE name = N'Sales')
	DROP TABLE Sales;
IF EXISTS (SELECT * FROM sys.partition_schemes WHERE name = N'psSales')
	DROP PARTITION SCHEME psSales;
IF EXISTS (SELECT * FROM sys.partition_functions WHERE name = N'pfSales')
	DROP PARTITION FUNCTION pfSales;

-- Create the Partition Function 
CREATE PARTITION FUNCTION pfSales (DATE)
AS RANGE RIGHT FOR VALUES 
('2013-01-01', '2014-01-01', '2015-01-01');

-- Create the Partition Scheme
CREATE PARTITION SCHEME psSales
AS PARTITION pfSales 
ALL TO ([Primary]);

-- Create the Partitioned Table (Heap) on the Partition Scheme
CREATE TABLE Sales (
	SalesDate DATE,
	Quantity INT
) ON psSales(SalesDate);

-- Insert test data
INSERT INTO Sales(SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM GetNums(DATEDIFF(DD,'2012-01-01','2016-01-01')) dates
CROSS JOIN GetNums(1000) AS qty;

-- View Partitioned Table information
SELECT
	OBJECT_SCHEMA_NAME(pstats.object_id) AS SchemaName
	,OBJECT_NAME(pstats.object_id) AS TableName
	,ps.name AS PartitionSchemeName
	,ds.name AS PartitionFilegroupName
	,pf.name AS PartitionFunctionName
	,CASE pf.boundary_value_on_right WHEN 0 THEN 'Range Left' ELSE 'Range Right' END AS PartitionFunctionRange
	,CASE pf.boundary_value_on_right WHEN 0 THEN 'Upper Boundary' ELSE 'Lower Boundary' END AS PartitionBoundary
	,prv.value AS PartitionBoundaryValue
	,c.name AS PartitionKey
	,CASE 
		WHEN pf.boundary_value_on_right = 0 
		THEN c.name + ' > ' + CAST(ISNULL(LAG(prv.value) OVER(PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), 'Infinity') AS VARCHAR(100)) + ' and ' + c.name + ' <= ' + CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100)) 
		ELSE c.name + ' >= ' + CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100))  + ' and ' + c.name + ' < ' + CAST(ISNULL(LEAD(prv.value) OVER(PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), 'Infinity') AS VARCHAR(100))
	END AS PartitionRange
	,pstats.partition_number AS PartitionNumber
	,pstats.row_count AS PartitionRowCount
	,p.data_compression_desc AS DataCompression
FROM sys.dm_db_partition_stats AS pstats
INNER JOIN sys.partitions AS p ON pstats.partition_id = p.partition_id
INNER JOIN sys.destination_data_spaces AS dds ON pstats.partition_number = dds.destination_id
INNER JOIN sys.data_spaces AS ds ON dds.data_space_id = ds.data_space_id
INNER JOIN sys.partition_schemes AS ps ON dds.partition_scheme_id = ps.data_space_id
INNER JOIN sys.partition_functions AS pf ON ps.function_id = pf.function_id
INNER JOIN sys.indexes AS i ON pstats.object_id = i.object_id AND pstats.index_id = i.index_id AND dds.partition_scheme_id = i.data_space_id AND i.type <= 1 /* Heap or Clustered Index */
INNER JOIN sys.index_columns AS ic ON i.index_id = ic.index_id AND i.object_id = ic.object_id AND ic.partition_ordinal > 0
INNER JOIN sys.columns AS c ON pstats.object_id = c.object_id AND ic.column_id = c.column_id
LEFT JOIN sys.partition_range_values AS prv ON pf.function_id = prv.function_id AND pstats.partition_number = (CASE pf.boundary_value_on_right WHEN 0 THEN prv.boundary_id ELSE (prv.boundary_id+1) END)
WHERE pstats.object_id = OBJECT_ID('Sales')
ORDER BY TableName, PartitionNumber;

Summary

SQL Server Table Partitioning Cheat Sheet

The partition function defines how to partition a table based on the values in the partition column. The partitioned table is created on the partition scheme that uses the partition function to map the logical partitions to physical filegroups.

If each partition is mapped to a separate filegroup, partitions can be placed on slower or faster disks based on how frequently they are accessed, historical partitions can be set to read-only, and partitions can be backed up and restored individually based on how critical the data is.

This post is the first in a series of Table Partitioning in SQL Server blog posts. It covers the basics of partitioned tables, partition columns, partition functions and partition schemes. Future blog posts in this series will build upon this information and these examples to explain other and more advanced concepts.

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)

Comments

Hi! This is Cathrine. Thank you so much for visiting my blog. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

Beautiful write-up!

Nicely done. Thank you!

Hi Cathrine,

congrats! Excellent work. Thank you.

Pingback: SQL New Blogger Digest – Week 2 | The Rest is Just Code

Well done summary and examples and very clear! Regards

Nice write up. I have been doing this for quite some time and this was clear and concise. One thing I ran across recently (SQL Server 2012) was when we put a database in replication and started getting an error in one of our table partition switch statements. Normally we pass the table name and partition as parameters to a procedure that does the switching using dynamic SQL (which is the workaround they describe). However, when you do not use dynamic SQL for the alter table statement and the partition passed is a parameter, you will get an error. Here is link describing my confusing way of explaining the problem.
https://support.microsoft.com/en-us/help/2002474/

Nice write up but I’m missing any hints about the possible problems, if you plan using partitioning on an existing databases.

The main problem for me is the fact, that the partition key must be part of the primary key and EVERY unique index on the table. Most of us are using f.e. order tables that have an ID column (IDENTY or UNID). You could add the order date to the PK but then you could get duplicate values in order_id (for different dates) except you prevent this by an additional INSERT trigger.

And of course you would have to change any foreign key from / to the ORDER table (and in the most cases the linked tables) -> add the ORDER_DATE to the ORDER_POSITIONS table (where it is usually absolute irrelevant) and change the FK from ORDER_ID to ORDER_ID + ORDER_DATE. Do the same for ORDER_ADDRESSES, INVOICES and about 5 other files too (and don’t forget to change your application to apply this changes).

An alternative way (if you are using an continual integer ID and not an UNID) would be to write a SQL Agent job that gets the highest ID every month / year (or whatever you are using as partition spane), adds a new file group and do a split on this id. The drawback for this methode is that you’ll need a separate partition function / schema for every table. Or (if the number of orders is relative low) you could prefix the ORDER_ID with the YYYYMM (or YYMM) so that you are using ID 2015040000 to 2015049999 for April 2015) and change you IDENTY column / table every month to appy the new prefix.

And of course would every query that does not includes the partition key in the where or JOIN condition be slower (f.e. if you gather all orders by CUSTOMER_ID).

Someone said once that MS gave us a very nice / powerful partition function but sprinkled it with sharp-edged glass shards…

Help full Thanks!

Great Work, Catherine!
Excellent diagrams and writeups. Very concise and extremely helpful.
Thank you!

great work i love it

Nice Explanation. Like it.

thank you very much

thank you

you are awesome Catherine!!! very nice article!!!!! :)
very very knowledgeable article.

This is very helpful Catherine for Partitioning beginners. Thank you.

very very help ful keep posting some more scnerios on like this

Pingback: Tuan Vo (dot) info | Table Partitioning in SQL Server – The Basics

Very nice article. Easy to understand the concepts about partitioning of tables and related concepts.

Excellent work, Thanks

quick way to tell if you have enterprise edition:

select serverproperty(‘Edition’)

Pingback: How To Partitioned Table In Sql | Information

great post thanks catherine!

One quick update, if people using SQL 2016- Table Partitioning is now available in Standard Edition, starting with SP1.
Well done article; thanks much!

Thank you very much, for nice explanation

Explained complex structure in an effective simple manner..Thanks.

What if I want to add another year in the future? Do I need to partition the table from the begining?

no, you just create a new filegroup, add it to the partition scheme and execute a split on the partition function

Brilliant explanation, cheers! I’ve been looking for a clear and concise tutorial for ages. I wonder if the Left & Right Range section might be simplified the arrow direction on your diagram was flipped? i.e. the arrows point left-to-right instead of right-to-left. To my mind it becomes more intuitive to think as the left as top and the right as bottom in that case.

Very well explained. Thank you! :)

Thank you Catherine. Simple and clear explanation.

Best explanation and demoing of partitioning I’ve seen.

Superb article!!!

Excellent Cathrine, dat super excellent

Thank you for this. It is probably the most clear article that I have read so far on this subject.

Nice article, waiting for the second part.

Hi Catherine. Great article on Partition Switching, nice and simple. Any ideas on this scenario please? 1 parent and 1 child table joined by an FK in Schema ‘1’. AND then an exact replica of the parent and the child table joined by an FK in Schema ‘2’. I want to switch data from the Parent table in Schema 1 to the schema 2 parent table and then Switch the data from child table in Schema ‘1’ to the schema ‘2’ child table. The switching works in that the tables are identical in their opposite schemas, including the FK’s and PK’s. However, the fact that in trying to switch the parent data you would orphan the child record means you get a check constraint error which causes a failure. Trying to Switch the Child record first results in an error similar, because you have no parent for the child in Schema 2 once you have switched it. Have you any experience of switching when using tables which are connected using Keys? Any ideas for FK’s allowing orphans for example or preventing the constraint checks that will allow my scenario to work for the transfer? I have the idea of ‘dropping all the constraints and then reapplying them after the switch’ as an option on the table(by design this won’t break), but I wish to consider other options as this seems like a work around to me. My goal is essentially to transfer data from multiple ‘stage’ tables to another schema of data store tables using switch (performance gain of instant transfer). The destination tables have to have PK’s and FK’s on them, but the source tables don’t have to (I have only put them on to make them the same as the destination tables, so if you know of another way this could work?). Any ideas? Thanks!

Hi cat,

Is it possible to crete table partition in a different location storage (in 1 server or more) in ms sql server ?

yes, you can add other filegroups and files to your database (which are stored on other disks / SAN drives) and assign this filegroups to your partitioning schema.

For example we have a FG_2016, FG_2017, FG_2018 and so on which are used to save the corresponding year partitions.

Brilliant article. It helped me understand SQL partitioning very well.

great explanation thank you

Thank you for taking the time to explain the concept in detail. Helps a lot

Excellent write up.. Thanks

is this mandatory ? that partition key column must be primary key column or at least part of the primary key columns ?

yes, it must be part of the primary key AND of every other UNIQUE key AND you have to specify it in each query (WHERE or JOIN), otherwise the SQL server could not eliminate partitions. Without elimination it would be the same, as if you would have a tbl1, tbl2, tbl3, tbl4 … and select all of those with an UNION ALL – it would be fast enough for smaller tables with only a few partitions, but slow for bigger tables with many partitions.

If you need the partitioning only for storage reasons (and do not need the partition switching stuff) you could create a partioned clustered index (e.g. over a date column) and create a non aligned (not partitioned) primary key over the id column.

I am missing 1 important point and probably the most important that was not demonstrated. The avoiding of locking due to the dividing of the boundaries over multiple filegroups and per filegroup at least 1 datafile. In thus example everything is still in 1 datafile. Locking processes will still exist.

i don’t want to be a sourpoez , still a great article, thumbs up

Thanks well done!

This is a great article that clarified left/right range brilliantly. I was confused when reading other posts but this made things clear.

Thank you very much.

Very nicely explained

Hi Catherine,

In my scenario, I have about ten tables for which i need archive data. My doubt is as far as i understand the partition boundaries are same to N number of tables. If i want to move data from one table and then use sliding window to manage the new boundaries, what happen to the other table boundaries and its data which i do not want to move?If i am not clear in explaining here please let me know so that i can explain further

Thanks

as long you do not merge the partitions (while there are still data in other tables), nothing would happen to them. You switch only in / out the partition of a single table at once. When you do splits on the partitions, you should make it before you insert any data of the new range in any of the tables, that uses this partition schema (e.g. split on the 31th January, before you insert February data in any table, otherwise it would have to move those data physical)

BTW: there is an option for TRUNCATE TABLE that allows you to truncate only a single or a range partitions instead of the whole table. This way you would not need a staging table to switch it out, when you are only want to delete old data.

Thanks for the explanation. To get more clarity, I have three tables all under different retention period(number of months the data needs to be retained and not to be archived) and contain data for different periods (say one table have data from Jan2020 to dec2020 and the other from Jan2019 to Oct 2019 and so on..) I am switching out the data and inserting to archival tables and then truncating/dropping the staging tables. Now my concern is after archiving step if i do split and merge on month last then for all the tables which might not have records for that period, it is a problem right ?
Or please suggest me can i have different PF and PS for each and every table and then manage their boundaries after archive the respective tables.?

of course you can have a separate pf / ps for each table. If you want, you can e.g. have a pf/ps_tbl1 and a pf/ps_tbl1_archiv. The first one is used for your “hot” data.

Then you switch out the old month into a staging table (which is not partiotioned), then you add a check constraint (to ensure the boundaries) to the staging table and switch it into the archive table.

The same for tbl2, tbl3 … tbl_n.

If you have to do this regularly, you should of course write some procedures that makes all the creating / splitting / merging / switching stuff for you….

Thanks Thomas. I have now created RANGE RIGHT PF and PS for each table and then archived the records to ARC tables. I have a scenario where i had my partitions created in 2010 and now being in 2020 I have written procedure to add more partitions from 2010 till date using NEXT USED and SPLIT. For NEXT used i have used the filegroup where the data of 2nd partition resides(FG2 in my case) and then i have split to add more partitions in the right. My doubt now is sliding window works fine but i see all the new partitions that i have created in above step resides in same FG FG2. Is this advisable and if not please suggest me a wise and simple method to do the same.

I’d suggest to create separate filegroups at least for every year, so you can place the current data on a fast SSD while the old stuff resists on the slow HDDs. Furthermore it allows you to remove a whole filegroup (first the data, then the files, then the FG) when you want to delete older years.

When everything is placed in a single filegroup, you would end up with a lot of empty space in this filegroup (after deleting old data), have problems to handle out-of-disk-space-problems, can’t do a partial restore of your database (restore the the hot data first and let the users start to use them, while you are still restoring the older filegroups)…

Furthermore I would not suggest to use the SPLIT on ranges, that already contains data. Better create a complete new table plus already splitted partition functions, check the indexes (maybe columnstore would be an option) and copy the data in smaller chunks (e.g. on day or month basis) into the new table. This is much more fail safe (and often faster) than to split a whole table (or worser multiple tables, when the partition function is used multible times) at once.

PS: small drawback of having multiple filegroups: you can’t switch partitions between different filegroups.

Wonderful explanations

It is a great article, excellent and easy explanation on partitioning data in sql server, pictures directly goes and stores permanently in my mind.

Hi! This is Cathrine (again). Just a reminder. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

Leave a Reply to veeracious Cancel reply

Secured By miniOrange