问题描述
我是R的新手,但有兴趣使用SQL Server数据库中存储的数据来创建动态图表.为了启用互动性,我想从数据库中引入原始数据并在R中执行计算,而不是让数据库汇总数据.
我能够使用RODBC连接到数据库,执行查询并在A data.frame中接收结果.但是,R中的读取时间比SQL Server Management Studio(SSMS)中执行的同一查询长约12倍. SSMS需要约600毫秒,而R大约需要7.6秒.我的问题是我做错了什么,还是R数据库访问真的很慢?如果是这样,是否有更快的替代方案(例如,将数据库输出写入文件并读取文件)?
有关查询的一些信息可能会有所帮助:查询检索约250k行,带有4列.第一列是日期,其他三个是数字值.运行R和SSM的机器是具有32GB内存的高端Win 7工作站.我正在运行的R命令是:
system.time(df <- sqlQuery(cn, query))
返回:
user system elapsed 7.17 0.01 7.58
有趣的是,从SQL到我的计算机的数据传输似乎很快,但是R在返回data.frame之前,R忙于内部进行几秒钟.我之所以看到这一点,是因为网络利用率在第一秒钟,几乎立即返回到0.然后几秒钟后,R data.frame返回.
推荐答案
我会尝试RJDBC http://cran.r-project.org/web/web/web/packages/rjdbc/rjdbc.pdf
使用这些驱动程序/sqlserver/AA937724.aspx
library(RJDBC) drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar") con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password") dbGetQuery(con, "select column_name from table")
其他推荐答案
例如,我将确保您的r TimeZone- Sys.setenv(TZ='GMT')设置为GMT - 与SQL Server的时区相同.可能是日期列需要很长时间才能解释,特别是如果它具有时间戳.
rjdbc将更快地运行,因为它将日期转换为字符和其他所有内容. RODBC将尝试保留SQL表的数据类型.
问题描述
I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within R rather than have the database summarize the data.
I am able to connect to the database using RODBC, execute a query, and receive results in a data.frame. However, the read time in R is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas R takes about 7.6 seconds. My question is whether I am doing something wrong, or is R just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?
Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running R and SSMS is a high-end Win 7 workstation with 32GB of memory. The R command that I am running is:
system.time(df <- sqlQuery(cn, query))
which returns:
user system elapsed 7.17 0.01 7.58
Interestingly, it appears that the data transfer from SQL to my machine is fast, but that R is busy doing things internally for several seconds before returning the data.frame. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the R data.frame returns.
推荐答案
I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
library(RJDBC) drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar") con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password") dbGetQuery(con, "select column_name from table")
其他推荐答案
I would make sure that your R timezone - Sys.setenv(TZ='GMT') set to GMT for example - is same as the time zone of the SQL server from where you are pulling data. It could be that the date column is taking a long time to be interpreted especially if it has a timestamp.
RJDBC will run quicker because it converts date to character and everything else to numeric. RODBC will try to preserve the data type of the SQL table.