Mittwoch, 1. Juni 2011

Tutorial 1: How to import RSS feeds into MS SQL 2008 (Part 1)

Part 1 | Part 2 | Part 3 Part 4
RSS 
or Really Simple Syndication is a standardized XML format for publishing web 
blogs or frequently updated web sites. MS SQL 2008, out of the box is capable of importing XML files. Though the built in tools raise errors when trying to import RSS feeds. This article offers a brief tutorial on how to download and import RSS feeds from a webpage using the SQL Server Business Intelligence Development Studio.



What can you do with RSS feeds?

First of all there are numerous applications which can download and view RSS
feeds. So what can you do with SQL? The main reason why you would want to import
a RSS feed into a database is to do further processing, send notifications or
apply filters on one or more feeds. Media monitoring is one possible use case
for importing RSS feeds into a database. SQL server allows you to search in RSS
feeds for brand names, company names, subjects matters, persons and any other
keyword and use the information for report generation.

What do you have to consider when processing RSS feeds?

When you consume RSS feeds you have to consider the terms and conditions of
the RSS publisher and the copyright laws of the publishers country. In many
cases the publisher may not want you to publish the feed data on a website or
intranet portal. Though in countries like Austria and Germany a company is
allowed to create paper copies of the publishers data for their own use. So
aggregating the data and showing them on a printable report is a possible
solution.

Tutorial: How to download and import RSS feeds into MS SQL 2008
This tutorial is written for developers and IT professionals who are familiar
with SQL Server Business Intelligence Studio and SQL. It shows first how to
download and parse the RSS feed using a SSIS package and shows then how to setup
the database table and procedures.

Downloading RSS feeds

1. Create a new Integragtion Services Project using the SQL Server Business
Intelligence Studio.

2. Then add a new script task and edit its source code. I use C#

3. On top of the script place the import of the System.Net Namespace if it's
not already there.
using System.Net

4. Add the method DownloadFeed below the Main() method and replace the URL
and filename. The URL is the RSS feed URL. The filename is the destination,
where it will be stored. In this tutorial all feeds will be saved in the same
folder. The import routine then reads all RSS XML files in the folder.
public bool DownloadFeed()
{
  WebClient web = new WebClient();
  try
  {
    web.DownloadFile(
      @"http://feeds.thehollywoodgossip.com/TheHollywoodGossip?format=xml",
      @"e:\import\TheHollywoodGossip.xml");
    return true;
  }
  catch (WebException)
  {
    return false;
  }
  finally
  {
     web.Dispose();
  }
}

5. Insert the following code into the Main() method
public void Main()
{
  // TODO: Add your code here
  if (DownloadFeed())
    Dts.TaskResult = (int)ScriptResults.Success;
  else
    Dts.TaskResult = (int)ScriptResults.Failure;
}

6.Repeat steps 1 to 5 for each xml feed you want to download and save your
package. The main advantage of creating seperate script tasks for each RSS feed
is that they can be executed parallel. The final result should look like the
image below.


The created script tasks:




Next: Tutorial Part 2 

Learn in the next part how to parse the RSS xml files after downloading.

Kommentare:

  1. Nice one. Thanks for the share. Keep posting such kind of information on your blog. I bookmarked it for continuous visit.
    youtube html5 player| html5 converter

    AntwortenLöschen
  2. What is the difference between DELETE and TRUNCATE commands?

    The best medical website design and marketing solutions for physicians in America.

    AntwortenLöschen