Part 1 | Part 2 | Part 3 | Part 4
The foregoing parts explained how to create the SSIS package to import
RSS feeds. The 4th and last part of this series covers how to setup the database
tables and stored procedures, which are called be the import package.
The headline table
The table 'fan.TutorialHeadline' is the destination of the RSS feed import. See
it's definition below.
-------------------------------------
-- final import destination
-------------------------------------
CREATE TABLE fan.TutorialHeadline
(
id int PRIMARY KEY IDENTITY (0,1),
publisher varchar(50),
title varchar(255) NOT NULL,
content varchar(MAX) NULL,
import_date datetime DEFAULT GETDATE(),
linkurl varchar(255) NOT NULL,
merge_hash binary(16) NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [Ix_merge_hash] ON [fan].[TutorialHeadline]
(
[merge_hash] ASC
)
GO
The import buffer table
To avoid duplicate entries we do not update the headline table directly. We
import the RSS feed data into another import table and merge just the missing
rows into the headline table. In this tutorial the definition of both the
headline and the import tables are identical. See the definition of the import
table below.
-------------------------------------
-- import buffer table
-------------------------------------
CREATE TABLE fan.TutorialImport
(
id int PRIMARY KEY IDENTITY (0,1),
publisher varchar(50),
title varchar(255) NOT NULL,
content varchar(MAX) NULL,
import_date datetime DEFAULT GETDATE(),
linkurl varchar(255) NOT NULL,
merge_hash binary(16) NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [Ix_merge_hash] ON [fan].[TutorialImport]
(
[merge_hash] ASC
)
GO
The hashing function
You may have noticed the indexed column merge_hash in both tables. Hence RSS
doesn't provide a unique ID to identify a RSS data item, I create a unique
hashcode, which is calculated from the feed title and content using MD5. The
function is used by the stored procedure which imports the data. See the
function definition below.
-----------------------------------
--- hash function
-----------------------------------
CREATE FUNCTION fan.TutorialHash
(@title varchar(100), @content varchar(MAX))
RETURNS binary(16)
AS
BEGIN
-- create merge hash
DECLARE @hash binary(16);
DECLARE @tc8000 varchar(8000); SET @tc8000 = CONVERT(varchar(8000), cast(@title as varchar(MAX)) + ' ' + @content);
SET @hash = CAST(HASHBYTES('MD5',@tc8000) AS binary(16));
RETURN @hash
END
GO
The import procedure
The stored procedure 'fan.TutorialAddHeadline' is called by SSIS package
created in the previous parts of this tutorial. The procedure is executed for
each item in a RSS feed. It creates the unique hashcode and creates a new row in
the import buffer table.
------------------------------------------
-- TutorialAddHeadline procedure
------------------------------------------
CREATE PROCEDURE fan.TutorialAddHeadline
@title varchar(100), @content varchar(MAX), @publisher varchar(50), @linkurl varchar(255)
AS
BEGIN
-- create merge hash
DECLARE @hash binary(16);
SET @hash = fan.TutorialHash(@title, @content);
-- create headline INSERT INTO fan.TutorialImport (publisher, title, content, linkurl, merge_hash)
VALUES (@publisher, @title, @content, @linkurl, @hash);
END
GO
Merge recently imported and existing data
The stored procedure 'fan.TutorialMerge' copies all rows from the import
buffer table to the destination table. It uses the MERGE command, which is new
in MS SQL 2008. The MERGE command uses the calculated hash code to find missing
rows in the destination table and inserts only new rows. After the merge the
import buffer table is cleared.
This procedure has to be called by SSIS package after the import. The next
step describes how to add this last task to the SSIS package. See the merge
procedure code listing below.
---------------------------------------
-- merge imported and existing data
---------------------------------------
CREATE PROCEDURE fan.TutorialMerge
AS
BEGIN
MERGE fan.TutorialHeadline h USING fan.TutorialImport m
ON h.merge_hash = m.merge_hash
WHEN NOT MATCHED BY TARGET THEN
INSERT (publisher, title, content, import_date, linkurl, merge_hash)
VALUES (m.publisher, m.title, m.content, m.import_date, m.linkurl, m.merge_hash);
DELETE FROM fan.TutorialImport;
END
GO
Adding the merge procedure to the SSIS package
The merge procedure 'fan.TutorialMerge' has to be executed after the import
to update the RSS data. Follow the steps below to add it to the SSIS package.
1. Add a new 'Execute SQL' taks to the SSIS package
2. Click on the foreach loop container and wait for the arrow to appear. Then
drag it on the 'Execute SQL' task
3. Right click the 'Execute SQL' task and choose edit
4. Make sure the task connects to your database and then edit the attribute
SQLStatement and enter the following code. Replace 'CelebrityDB' by your
database name.
USE CelebrityDB -- replace db name
GO
EXEC fan.TutorialMerge
5. Click OK to close the editor
The SSIS package is now complete. You can now deploy it on the server. When
you execute the package, it should look like the one on the image below.
Executed SSIS package:
Final Result
After the import the RSS feed data is stored in the table 'fan.TutorialHeadline'.
You can read the RSS feeds now with your SQL Server Management Studio.
Result of the RSS import:
Thank you for reading this tutorial!
More tutorials are coming soon. Your feedback helps me a lot to improve this
blog. Please leave a comment!