Skip to content

Using a Numbers Table in SQL Server to insert test data (T-SQL Tuesday #65)

T-SQL Tuesday

T-SQL Tuesday #65 is hosted by Mike Donnelly (@SQLMD). There is no specific topic to write about this month, Mike simply wants us to learn something new and then write a blog post to teach it to others. I want to share something that I only recently learned, something I wish I had known about years ago, something that became part of my toolbox as soon as I discovered it: the Numbers Table. It is so simple and solves so many problems that everyone should know about it :)

A Numbers Table (perhaps most known as a Tally Table, sometimes called an Auxiliary Table of Numbers and even referred to as the Swiss Army Knife of SQL Server) is a one-column helper table that contains the numbers 1, 2, 3, 4, 5 and so on all the way up to the-highest-number-you-could-possibly-need.

It can be used to replace slower loops and row-by-row operations with faster set-based operations, generate dates, split strings, find gaps in data sets, expand data sets, insert test data and probably hundreds of other things. There are so many great and detailed articles already published about this topic, so I will stick to the T-SQL Tuesday topic of “Teach Something New” and share the two most recent things I learned: different ways to create a numbers table, and different ways to quickly insert test data by using a numbers table.

Create a Numbers Table

My preferred way of creating a numbers table is to use Itzik Ben-Gan’s solution described in the Virtual Auxiliary Table of Numbers article. The solution uses CROSS JOINs to generate rows and then a ROW_NUMBER() to create the actual sequence of numbers.

You can use Itzik Ben-Gan’s solution to generate a virtual numbers table when needed, you can create a function that returns a numbers table with a certain number of rows, or you can create an actual numbers table that live in your database.

By creating the function or table in your model database, they will automatically be created in new databases for you. Handy, right? :)

/* --------------------------------------------------
-- Virtual Numbers Table
-------------------------------------------------- */
WITH
	L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
	L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
	L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
	L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
	L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
	L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
	Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (1000) n FROM Nums ORDER BY n;

/* --------------------------------------------------
-- Numbers Table
-------------------------------------------------- */
USE model;
GO

IF OBJECT_ID('dbo.Numbers') IS NOT NULL
  DROP TABLE dbo.Numbers;
GO

CREATE TABLE Numbers (
	n BIGINT NOT NULL,
	CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (n) WITH FILLFACTOR = 100
);
GO

WITH
	L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
	L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
	L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
	L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
	L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
	L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
	Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.Numbers (n)
SELECT TOP (100000) n FROM Nums ORDER BY n; -- Insert as many numbers as you need
GO

SELECT n FROM dbo.Numbers WHERE n <= 1000;
GO

/* --------------------------------------------------
-- Numbers Table Function
-------------------------------------------------- */
USE model;
GO

IF OBJECT_ID('dbo.GetNums') IS NOT NULL
  DROP FUNCTION dbo.GetNums;
GO
 
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE AS RETURN
  WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
  SELECT TOP (@n) n FROM Nums ORDER BY n;
GO

SELECT n FROM dbo.GetNums(1000);
GO

Insert Test Data

I learned about the numbers table while researching table partitioning. All I needed was to insert a specified number of rows in each partition so I could test things like partition switching, splitting and merging. I didn’t want my tests to depend on existing data.

Writing every single insert statement was not an option. I first tried GO [count] to insert a row a specified number of times. The syntax is easy to remember and it works ok on a small amount of data:





INSERT INTO Sales (SalesDate, Quantity) VALUES ('2012-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2013-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2014-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2015-01-01', 10)
GO 1000

However, GO [count] works like a WHILE loop, it executes a batch a specified number of times. The more times you execute a batch, the longer it takes. On average, inserting 1000 rows took 0-1 seconds. Inserting 10000 rows took 3-4 seconds. Inserting 100000 rows took 30-35 seconds. And inserting 1 million rows took 5.5 minutes!

After some research I discovered the numbers table, created the Numbers Table Function, and rewrote my queries:

INSERT INTO Sales (SalesDate, Quantity) SELECT '2012-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2013-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2014-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2015-01-01', 10 FROM dbo.GetNums(1000);

What a difference! It was blazing fast compared to using GO [count]. On average, inserting 1000 rows took 2 milliseconds. Inserting 10000 rows took 25 milliseconds. Inserting 100000 rows took 140 milliseconds. And inserting 1 million rows took 1.5 seconds instead of 5.5 minutes!

This worked great until I wanted to insert 1000 rows per day instead of year. I didn’t have to write every single insert statement, but I still had to write an insert statement for every single day. That was not an option, so I rewrote my queries yet again. This time I used the same CROSS JOIN logic as Itzik Ben-Gan used in his numbers table solution. I first created rows for all the days and then cross joined to get 1000 rows for each day:

/* --------------------------------------------------
-- Insert from Virtual Numbers Table
-------------------------------------------------- */
WITH
	L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
	L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
	L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
	L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
	L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
	L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
	Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM Nums AS dates
CROSS JOIN Nums AS qty
WHERE dates.n <= DATEDIFF(DD,'2012-01-01','2016-01-01')
AND qty.n <= 1000;
GO

/* --------------------------------------------------
-- Insert from Numbers Table
-------------------------------------------------- */
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity 
FROM dbo.Numbers dates
CROSS JOIN dbo.Numbers AS qty
WHERE dates.n <= DATEDIFF(DD,'2012-01-01','2016-01-01')
AND qty.n <= 1000;

/* --------------------------------------------------
-- Insert from Numbers Table Function
-------------------------------------------------- */
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM dbo.GetNums(DATEDIFF(DD,'2012-01-01','2016-01-01')) dates
CROSS JOIN dbo.GetNums(1000) AS qty;

Each of these queries only take a couple of seconds. My preferred way of inserting test data is the last query that uses the Numbers Table Function. It is short and the syntax is easy to remember.

Summary

Create a Numbers Table and / or a Numbers Table Function in the model database so they are instantly created in all new databases. Use CROSS JOIN logic to write short and powerful queries to insert test data. Finally, inserting data is just one use case, have fun and play around with these things to see if you can come up with completely new solutions to solve old problems!

And as a bonus, if you thought this article was interesting, you should absolutely watch Itzik Ben-Gan’s Boost your T-SQL with the APPLY Operator session from SQLBits XIV. He explains how to use APPLY instead of JOIN for more advanced use cases, and I promise some really cool demos. Thank you so much, Itzik! :)

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)

Comments

Hi! This is Cathrine. Thank you so much for visiting my blog. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

Pingback: Using a Numbers Table in SQL Server to insert test data | just dave info

Pingback: T-SQL Tuesday #065 – Teach Something New (Roundup) | Mike Donnelly, SQLMD

hi catherine, what is the ‘n’ here, i couldn’t understand. i couldn’t understand the expression
dateadd(day, dates.n-1,’2012-01-01′) could you please explain the two part ‘n’ convention?

N is the number that you get from the Numbers table or GetNums function. I have aliased these as dates, which is why I use dates.n. I encourage you to pick apart the code examples and work through them piece by piece to understand the details :)

Pingback: Partions in sql | iaskalyani45

Pingback: SQL New Blogger Digest – Week 2 | The Rest is Just Code

Hi! This is Cathrine (again). Just a reminder. I'd love to hear your thoughts, but please keep in mind that I'm not technical support for any products mentioned in this post :) Off-topic questions, comments and discussions may be moderated. Be kind to each other. Thanks!

Leave a Reply to Cathrine Cancel reply