Using REST API in SSIS to Extract Top 100 User Tweets

Abstract

The growing popularity of the use of social networks by businesses indicate a possible change in ETL requirements. Part of this change is that developers had to revise existing ETL model such that it caters for sourcing of data off the internet in addition to the traditional flat files or OLE/ODBC Connections. In this article, I will demonstrate extracting user tweets using Twitter\’s REST API inside an HTTP Connection Manager.

Requirements

Article
Let’s begin by launching Microsoft Visual Studio and creating a new Integration Services Project which is located under Business Intelligence Projects category. (If you are using SQL Server 2012 or later, you will have to launch SQL Server Data Tools – SSDT).
After you have assigned a project name, proceed to click and drag the Data Flow Task (DFT) into the Control Flow pane from the toolbox (SSIS Toolbox in SSDT). I decided to name my DFT as DFT – Retrieve Tweets
Add the following two connections:

  • HTTP Connection Manager
  • OLE DB Connection

I have configured my HTTP Connection as follows:

  • Server URL = http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=mafiswana&count=100

Note that you can replace the value of parameter screen_name with your Twitter user account instead of mafiswana.
I have configured my OLE DB Connection as follows:

  • Server = Localhost
  • Database Name = selectSifiso

So far the package should look as below:
\"\"
Let\’s go back and edit the Data Flow Task.
Add a Script Component (SC) which is located under Data Flow Transformations.
Select Source as a script component type.
Edit the script SC and click on the Connection Managers tab on the left of the Script Transformation Editor. Under the Connection Manager column click to select the HTTP Connection Manager (it should be the only connection available).
Let\’s move on to the Inputs and Outputs tab in the Script Transformation Editor. Here we will configure the source output columns.
Collapse the Output 0 node and click on Output Columns. Click Add Column and the following columns with properties configured as below:

  • Name = sS_Tweets; DataType = string [DT_STR]; Length = 200
  • Name = sS_TwitterDate; DataType = string [DT_STR]; Length = 200

Now let\’s move on to the Script tab in the Script Transformation Editor.
In Object Explorer, right-click on References and Click Add. Under the .NET tab add the following references:

  • System.ServiceModel
  • System.ServiceModel.Web

If one or none of the above references are not available for selection, please ensure that you have .NET 3.5 Framework selected. You can do that by clicking on Properties – Application – Target Framework – .NET Framework 3.5
In addition to the already referenced namespaces, add the following namespaces:

  • using System.Xml;
  • using System.ServiceModel.Syndication;

Declare the following variables in the public class ScriptMain : UserComponent

  • private SyndicationFeed sS_Tweets = null;
  • private XmlReader sS_XmlReader = null;

Under the PreExecute() method, add the following:

  • sS_XmlReader = XmlReader.Create(Connections.Connection.ConnectionString);
  • sS_Tweets = SyndicationFeed.Load(sS_XmlReader);

Under the public override void CreateNewOutputRows() method, add the following:
if (sS_Tweets != null)
{
for each (var item in sS_Tweets.Items)
{
Output0Buffer.AddRow();
Output0Buffer.sS_Tweets= item.Title.Text;
Output0Buffer.sS_TwitterDate= item.PublishDate.ToString();
}
Output0Buffer.SetEndOfRowset();
}
The complete script code can be found here – tweets_ssis
Now that we have configured the Script Component as Source let us add an OLE DB Destination Component (ODD) and connect Script Component to the Destination Component.
I have configured the OLE DB connection manager of the ODD to use the selectSifiso connection. The data access mode is a Table or view – fast load. Under the Name of the table or the view, click New and create and output table. Click on the Mappings tab and ensure that the source-to-destination mappings are correct.
Your complete package data flow task should look as follows:
\"\"
Conclusion
All left to do is to run the package and your tweets will be extracted and stored in SQL Server.
It’s that simple.
Till next time folks, cheers.
Sifiso
CodeProject

4 thoughts on “Using REST API in SSIS to Extract Top 100 User Tweets”

  1. I followed the steps and when I run the package, I get this message:
    [Script Component [2]] Error: System.Xml.XmlException: The element with name ‘html’ and namespace ” is not an allowed feed format.
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)
    Any ideas?

  2. I am extremely impressed along with your writing abilities as well as with
    the layout on your weblog. Is that this a paid topic
    or did you customize it yourself? Either
    way stay up the nice quality writing, it is rare to look a great weblog like this one today.
    .

  3. Hi,
    Many thanks for posting this blog, it is very useful, i tried to replicate the same but for a reatime example at work. I am a newbie when it comes to scripting, i managed to add the below script to consume data from an api, for some reason the output side is not working. can you please shed you wisdom on this when you got a min, Many thanks
    Anbu
    /* Microsoft SQL Server Integration Services Script Component
    * Write scripts using Microsoft Visual C# 2008.
    * ScriptMain is the entry point class of the script.*/
    using System;
    using System.Data;
    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
    using Microsoft.SqlServer.Dts.Runtime.Wrapper;
    using System.Xml;
    using System.ServiceModel.Syndication;
    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {
    private SyndicationFeed ContactId = null;
    private XmlReader XmlReader = null;
    public override void PreExecute()
    {
    base.PreExecute();
    XmlReader = XmlReader.Create(Connections.Connection.ConnectionString);
    ContactId = SyndicationFeed.Load(XmlReader);
    /*
    Add your code here for preprocessing or remove if not needed
    */
    }
    public override void PostExecute()
    {
    base.PostExecute();
    /*
    Add your code here for postprocessing or remove if not needed
    You can set read/write variables here, for example:
    Variables.MyIntVar = 100
    */
    }
    public override void CreateNewOutputRows()
    {
    if (ContactId != null)
    {
    foreach (var item in ContactId.Items)
    {
    Output0Buffer.AddRow();
    Output0Buffer.ContactId = item.ContactId.int32;
    Output0Buffer.Email = item.Email.ToString();
    Output0Buffer.NumOpens = item.NumOpens.int32;
    Output0Buffer.NumPageViews = item.NumPageViews.int32;
    Output0Buffer.NumClicks = item.NumClicks.int32;
    Output0Buffer.NumForwards = item.NumForwards.int32;
    Output0Buffer.NumEstimatedForwards = item.NumEstimatedForwards.int32;
    Output0Buffer.NumReplies = item.NumReplies.int32;
    Output0Buffer.DateSent = item.DateSent.date;
    Output0Buffer.DateFirstOpened = item.DateFirstOpened.date;
    Output0Buffer.DateLastOpened = item.DateLastOpened.date;
    Output0Buffer.FirstOpenIp = item.FirstOpenIp.ToString();
    Output0Buffer.Unsubscribed = item.Unsubscribed.bol;
    Output0Buffer.SoftBounced = item.SoftBounced.bol;
    Output0Buffer.HardBounced = item.HardBounced.bol;
    }
    Output0Buffer.SetEndOfRowset();
    }
    /*
    Add rows by calling the AddRow method on the member variable named “Buffer”.
    For example, call MyOutputBuffer.AddRow() if your output was named “MyOutput”.
    */
    }
    }

Leave a Comment