With SQL Server 2016, you can run R scripts to retrieve data directly from SQL Server Databases. This has come with new feature with SQL Server 2016.
To work with R Services, Microsoft has introduced a system stored procedure namely Sys.sp_execute_external_script. This stored procedure provides the structure necessary to run R Script against SQL Server data on the SQL Server Environment. In the SQL Server 2016, it supports only the R Language where Phython is supported in the SQL Server 2016.
Before you use this stored procedure, you must enable your SQL Server instance to support external scripts. First, you need to check whether R Services is installed correctly and running. If you have not installed the R Services or not enabled the external scripts and not sure how to go ahead, please refer to MSDN article Set up SQL Server R Services (In-Database).
Once you have R Services ready, you should be able to try with the sp_execute_external_script stored procedure. The following syntax show the procedure basic elements:
sp_execute_external_script
@language = N’language’,
@script = N’script’,
[ @input_data_1 = ‘SelectStatement’ [,] ] [ @input_data_1_name = N’InputDataSet’ [,] ] [ @output_data_1_name = N’OutputDataSet’ [,] ] [ @params = N’@ParameterName DataType [ OUT | OUTPUT ] [ ,…n ]’ [,] ] [ @ParameterName = ‘value’ [ OUT | OUTPUT ] [,] [ ,…n ] ] [ WITH <execute_option> ] [;]<execute_option>::=
{
{ RESULT SETS UNDEFINED }
| { RESULT SETS NONE }
| { RESULT SETS ( <result_sets_definition> ) }
}
Using SQL Server data in R scripts
As fun as it might be to run R scripts alone in SQL Server, the real power that comes with R Services is the ability to incorporate SQL Server data into those scripts. For that, we need to include the procedure’s @input_data_1 parameter, which specifies the SELECT statement to use to retrieve data from our database, as shown in the following example:
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N’OutputDataSet <- InputDataSet;’;
DECLARE @sqlscript NVARCHAR(MAX);
SET @sqlscript = N’
SELECT FirstName, LastName, SalesYTD
FROM Sales.vSalesPerson
WHERE SalesYTD > 2000000
ORDER BY SalesYTD DESC;’;
EXEC sp_execute_external_script
@language = N’R’,
@script = @rscript,
@input_data_1 = @sqlscript;
GO
As with the R script, I’m using a T-SQL variable (@sqlscript) to store the SELECT statement. I then assign that variable to the @input_data_1 parameter.
I’ve also updated the R script itself to assign the InputDataSet value to the OutputDataSet variable. InputDataSet is the default name of the variable used to represent the query specified in the @input_data_1 parameter. In other words, I’m assigning the data returned by the SELECT statement to the OutputDataSet variable, giving us the results shown in the following table:
(No column name) | (No column name) | (No column name) |
Linda | Mitchell | 4251368.5497 |
Jae | Pak | 4116871.2277 |
Michael | Blythe | 3763178.1787 |
Jillian | Carson | 3189418.3662 |
Ranjit | Varkey Chudukatil | 3121616.3202 |
José | Saraiva | 2604540.7172 |
Shu | Ito | 2458535.6169 |
Tsvi | Reiter | 2315185.611 |
All we’ve done here is to pass our T-SQL query results to the R script, which returns those same results as that of the SELECT statement. Of course, this is something we could have done without using an R script; however, by being able to pass our query results to the R script, we can then use the analytical power built into R to do something
with that data, which is the whole point of SQL Server R Services and the sp_execute_external_script stored procedure.
For example, the following R script divides the SalesYTD totals by 7 and then rounds that figure to two decimal points:
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N’
OutputDataSet <- InputDataSet;
OutputDataSet[,3] <- round(InputDataSet$SalesYTD / 7, 2);’;
DECLARE @sqlscript NVARCHAR(MAX);
SET @sqlscript = N’
SELECT FirstName, LastName, SalesYTD
FROM Sales.vSalesPerson
WHERE SalesYTD > 2000000
ORDER BY SalesYTD DESC;’;
EXEC sp_execute_external_script
@language = N’R’,
@script = @rscript,
@input_data_1 = @sqlscript;
GO
The idea here is it find the average monthly sales based on the assumption that seven months have passed so far this year and then round that figure. To do so, I’ve added a second line to the R script. To the left of the assignment operator, I’ve specified OutputDataSet[,3] to indicate that we want to update the third column within data frame.
To the right of the assignment operator, we reference the InputDataSet variable again, but this time add a dollar sign ($) and the name of the source column (SalesYTD). We then divide this amount by 7and use the round function to round the amount to two decimal points, giving us the results shown in the following table:
(No column name) | (No column name) | (No column name) |
Linda | Mitchell | 607338.36 |
Jae | Pak | 588124.46 |
Michael | Blythe | 537596.88 |
Jillian | Carson | 455631.2 |
Ranjit | Varkey Chudukatil | 445945.19 |
José | Saraiva | 372077.25 |
Shu | Ito | 351219.37 |
Tsvi | Reiter | 330740.8 |