[Notes from Pinal]: R Services are very much new thing and trust me, I still have to get my hand dirty on this subject. I know the basics of it, but still I personally do not know enough that I can blog about it and teach others. This is when I reach out to Eduardo, he has this incredible ability to learn new thing and teach others in the lucid easy way. I requested Eduardo to help me out with R Services and Forecasting and he happily accepted.
In this 131st episode of the Notes from the Fields series database expert Eduardo Castro (partner at Linchpin People) shares very interesting conversation related to we can get started with R Services and Forecasting.
One of the new challenges the data professional is to deal with are the new techniques that are emerging for data analysis, those techniques are in the field of Data Science. Recently I was working in a project and a new requirement was brought to the pipeline, we were requested to apply some forecasting algorithms to the data we already had in our database, immediately data mining and Analysis Services came to my mind, but we had a restriction: the algorithm was already written using R Studio by one the company’s data scientists.
Based on that restriction, we asked the project manager authorization to use the new feature in SQL Server 2016 that allows us to include R Scripts inside the database, the transition of the R code in SQL Server 2016 is possible, and in this post I will show you how to do it.
First, you must be using SQL Server 2016 and you must install the Advanced Analytics Extensions also known as R Services, this is done during the installation process, the following image shows what must be selected during setup.
Optionally, you can also install the Microsoft R Server Standalone as a Shared Feature. During installation, you will be asked to install some additional libraries to support the R Services, you can accept the defaults.
Once you have the R Services installed, you must check the R Script that must run inside SQL Server is working properly, the best way to do it is by using R Studio, which a free tool to create R Scripts, you must take a copy of the script and check if it runs inside R Studio, if the script has any problem it must be corrected before trying it inside SQL Server. The following image shows the successful installation of the R Services.
In this blog, we will use a very simple R Script to show how can we ported it to SQL Server, the sample R Script uses a very well know data set called Air Passengers, the dataset contains information about the passenger statistics, then we will use a Forecasting function to predict the next values in the time series.
The following figure shows the sample code inside R Studio.
If we run this script we will get the forecasted values for the passenger data, the output in the R Studio Console is shown in the following image.
Once you have the R Services installed you must configure the SQL Server instance to allow the execution of R Scripts, the R Scripts are run as an external script so we need to activate this feature in SQL Server using the following command, and the restart SQL Server.
sp_configure 'external scripts enabled', 1; RECONFIGURE GO
Is very common that the R Scripts need any specific library and these libraries are not installed by default, for example the “tseries” package must be installed in order the script to run. The installation of the required packages can be done using the RGui which is installed when you choose the Analytics Libraries during the SQL Server Installation, you must run the RGui as an administrator in order to be able to install the libraries. The RGui console is shown below:
If you need to install a package for example “tseries” you must run the following command.
install.packages("tseries", lib = "C:/Program Files/Microsoft SQL Server/MSSQL13.MSSQLSERVER/R_SERVICES/library")
The following image shows the successful installation of the package.
We already had tested our R Script in R Studio so we can start porting it to SQL Server as an in-database script, basically the execution of the R Scripts is done using the EXEC sp_execute_external_script and we must specify that the script language is R. The basic parameters needed to execute R Scripts are explained in the following table:
@language = N'R'
|This parameter specifies that the scripting language is R|
@script = N' '
|This parameter contains the script body, what you must do is to copy the content of the script in R Studio and include it inside this parameter|
@input_data_1 = N' Your T-SQL goes here'
|If you need to read data from a SQL Server table, for example a valid input can be:|
USE [AdventureworksDW2016] GO SELECT [ModelRegion] ,[TimeIndex] ,[Quantity] ,[Amount] ,[CalendarYear] ,[Month] ,[ReportingDate] FROM [dbo].[vTimeSeries] GO
The variable @input_data_1 will contain the result the of query and this variable can be used inside the R Script to manipulate the data.
@output_data_1_name = N' Output Data Frame Name'
|This parameter reference a data frame generated inside the R Script and its intention is to hold the result of the forecasting algorithm.|
WITH RESULT SETS (([U] float , [L] float, [Forecast] float ))
|This last part is optional and it gives you the option the specify the name and the data type of the data frame columns included in the output of the R Script|
The complete ported R Script is shown below:
-- calculate forecasting with R EXEC sp_execute_external_script @language = N'R', @script = N' library(tseries) f <- decompose(AirPassengers); fit <- arima(AirPassengers, order=c(1,0,0), list(order=c(2,1,0), period=12)); fore <- predict(fit, n.ahead=24); U <- fore$pred + 2*fore$se; L <- fore$pred - 2*fore$se; df_U <- data.frame(U,L,fore$pred); ' , @input_data_1 = N'', @output_data_1_name = N'df_U' WITH RESULT SETS (([U] float , [L] float, [Forecast] float ))
If we run the previous script inside the Management Studio we will get the following output.
The above result is the same we got inside R Studio, the main difference is that the script is now part of our database solution and is using the Microsoft R Server and SQL Server 2016 to execute the script.
There are new requirements for the DBA and is more often that they come from the Data Science area, in this blog we had shown how to leverage the new features in SQL Server 2016 to include R Scripts inside the database in an integrated way. We had also shown how to port current R Scripts to SQL Server 2016.
Reference: Pinal Dave (https://blog.sqlauthority.com)