Title: | DBI Connector to Presto |
---|---|
Description: | Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>. |
Authors: | Onur Ismail Filiz [aut], Sergey Goder [aut], Jarod G.R. Meng [aut, cre], Thomas J. Leeper [ctb], John Myles White [ctb] |
Maintainer: | Jarod G.R. Meng <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 1.4.6.9000 |
Built: | 2024-11-13 05:29:53 UTC |
Source: | https://github.com/prestodb/rpresto |
This auxiliary function adds a field, if necessary, to a data frame so that each compartment of the data frame that corresponds to a unique combination of the chunk fields has a size below a certain threshold. This resulting data frame can then be safely used in dbAppendTable() becauase Presto has a size limit on any discrete INSERT INTO statement.
add_chunk( value, base_chunk_fields = NULL, chunk_size = 1e+06, new_chunk_field_name = "aux_chunk_idx" )
add_chunk( value, base_chunk_fields = NULL, chunk_size = 1e+06, new_chunk_field_name = "aux_chunk_idx" )
value |
The original data frame. |
base_chunk_fields |
A character vector of existing field names that are used to split the data frame before checking the chunk size. |
chunk_size |
Maximum size (in bytes) of the VALUES statement encoding each unique chunk. Default to 1,000,000 bytes (i.e. 1Mb). |
new_chunk_field_name |
A string indicating the new chunk field name. Default to "aux_chunk_idx". |
## Not run: # returns the original data frame because it's within size add_chunk(iris) # add a new aux_chunk_idx field add_chunk(iris, chunk_size = 2000) # the new aux_chunk_idx field is added on top of Species add_chunk(iris, chunk_size = 2000, base_chunk_fields = c("Species")) ## End(Not run)
## Not run: # returns the original data frame because it's within size add_chunk(iris) # add a new aux_chunk_idx field add_chunk(iris, chunk_size = 2000) # the new aux_chunk_idx field is added on top of Species add_chunk(iris, chunk_size = 2000, base_chunk_fields = c("Species")) ## End(Not run)
dbplyr database methods
## S3 method for class 'PrestoConnection' db_list_tables(con) ## S3 method for class 'PrestoConnection' db_has_table(con, table) ## S3 method for class 'PrestoConnection' db_write_table( con, table, types, values, temporary = FALSE, overwrite = FALSE, ..., with = NULL ) ## S3 method for class 'PrestoConnection' db_copy_to( con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ..., in_transaction = TRUE, with = NULL ) ## S3 method for class 'PrestoConnection' db_compute( con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, with = NULL, ... ) ## S3 method for class 'PrestoConnection' db_sql_render(con, sql, ..., use_presto_cte = TRUE)
## S3 method for class 'PrestoConnection' db_list_tables(con) ## S3 method for class 'PrestoConnection' db_has_table(con, table) ## S3 method for class 'PrestoConnection' db_write_table( con, table, types, values, temporary = FALSE, overwrite = FALSE, ..., with = NULL ) ## S3 method for class 'PrestoConnection' db_copy_to( con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ..., in_transaction = TRUE, with = NULL ) ## S3 method for class 'PrestoConnection' db_compute( con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, with = NULL, ... ) ## S3 method for class 'PrestoConnection' db_sql_render(con, sql, ..., use_presto_cte = TRUE)
Create a table in database using a statement
dbCreateTableAs(conn, name, sql, overwrite = FALSE, with = NULL, ...)
dbCreateTableAs(conn, name, sql, overwrite = FALSE, with = NULL, ...)
conn |
A DBIConnection object, as returned by
|
name |
The table name, passed on to
|
sql |
a character string containing SQL statement. |
overwrite |
A boolean indicating if an existing table should be overwritten. Default to FALSE. |
with |
An optional WITH clause for the CREATE TABLE statement. |
... |
Other parameters passed on to methods. |
object
Return the corresponding presto data type for the given R object
## S4 method for signature 'PrestoDriver' dbDataType(dbObj, obj, ...)
## S4 method for signature 'PrestoDriver' dbDataType(dbObj, obj, ...)
dbObj |
A PrestoDriver object |
obj |
Any R object |
... |
Extra optional parameters, not currently used |
The default value for unknown classes is ‘VARCHAR’.
A character
value corresponding to the Presto type for
obj
drv <- RPresto::Presto() dbDataType(drv, 1) dbDataType(drv, NULL) dbDataType(drv, as.POSIXct("2015-03-01 00:00:00", tz = "UTC")) dbDataType(drv, Sys.time()) dbDataType( drv, list( c("a" = 1L, "b" = 2L), c("a" = 3L, "b" = 4L) ) ) dbDataType( drv, list( c(as.Date("2015-03-01"), as.Date("2015-03-02")), c(as.Date("2016-03-01"), as.Date("2016-03-02")) ) ) dbDataType(drv, iris)
drv <- RPresto::Presto() dbDataType(drv, 1) dbDataType(drv, NULL) dbDataType(drv, as.POSIXct("2015-03-01 00:00:00", tz = "UTC")) dbDataType(drv, Sys.time()) dbDataType( drv, list( c("a" = 1L, "b" = 2L), c("a" = 3L, "b" = 4L) ) ) dbDataType( drv, list( c(as.Date("2015-03-01"), as.Date("2015-03-02")), c(as.Date("2016-03-01"), as.Date("2016-03-02")) ) ) dbDataType(drv, iris)
Metadata about database objects
For the PrestoResult object, the implementation
returns the additional stats
field which can be used to
implement things like progress bars. See the examples section.
## S4 method for signature 'PrestoDriver' dbGetInfo(dbObj) ## S4 method for signature 'PrestoConnection' dbGetInfo(dbObj) ## S4 method for signature 'PrestoResult' dbGetInfo(dbObj)
## S4 method for signature 'PrestoDriver' dbGetInfo(dbObj) ## S4 method for signature 'PrestoConnection' dbGetInfo(dbObj) ## S4 method for signature 'PrestoResult' dbGetInfo(dbObj)
dbObj |
A PrestoDriver, PrestoConnection or PrestoResult object |
PrestoResult A list()
with elements
The SQL sent to the database
Number of rows fetched so far
Whether all data has been fetched
Current stats on the query
## Not run: conn <- dbConnect(Presto(), "localhost", 7777, "onur", "datascience") result <- dbSendQuery(conn, "SELECT * FROM jonchang_iris") iris <- data.frame() progress.bar <- NULL while (!dbHasCompleted(result)) { chunk <- dbFetch(result) if (!NROW(iris)) { iris <- chunk } else if (NROW(chunk)) { iris <- rbind(iris, chunk) } stats <- dbGetInfo(result)[["stats"]] if (is.null(progress.bar)) { progress.bar <- txtProgressBar(0, stats[["totalSplits"]], style = 3) } else { setTxtProgressBar(progress.bar, stats[["completedSplits"]]) } } close(progress.bar) ## End(Not run)
## Not run: conn <- dbConnect(Presto(), "localhost", 7777, "onur", "datascience") result <- dbSendQuery(conn, "SELECT * FROM jonchang_iris") iris <- data.frame() progress.bar <- NULL while (!dbHasCompleted(result)) { chunk <- dbFetch(result) if (!NROW(iris)) { iris <- chunk } else if (NROW(chunk)) { iris <- rbind(iris, chunk) } stats <- dbGetInfo(result)[["stats"]] if (is.null(progress.bar)) { progress.bar <- txtProgressBar(0, stats[["totalSplits"]], style = 3) } else { setTxtProgressBar(progress.bar, stats[["completedSplits"]]) } } close(progress.bar) ## End(Not run)
Inform the dbplyr version used in this package
## S3 method for class 'PrestoConnection' dbplyr_edition(con)
## S3 method for class 'PrestoConnection' dbplyr_edition(con)
con |
A DBIConnection object. |
Rename a table
dbRenameTable(conn, name, new_name, ...)
dbRenameTable(conn, name, new_name, ...)
conn |
A PrestoConnection. |
name |
Existing table's name. |
new_name |
New table name. |
... |
Extra arguments passed to dbExecute. |
The configs specify authentication protocol and additional settings.
kerberos_configs(user = "", password = "", service_name = "presto")
kerberos_configs(user = "", password = "", service_name = "presto")
user |
User name to pass to httr::authenticate(). Default to "". |
password |
Password to pass to httr::authenticate(). Default to "". |
service_name |
The service name. Default to "presto". |
A httr::config() output that can be passed to the request.config argument of dbConnect().
Connect to a Presto database
Presto(...) ## S4 method for signature 'PrestoDriver' dbConnect( drv, catalog, schema, user, host = "localhost", port = 8080, source = methods::getPackageName(), session.timezone = "", output.timezone = "", parameters = list(), ctes = list(), request.config = httr::config(), use.trino.headers = FALSE, extra.credentials = "", bigint = c("integer", "integer64", "numeric", "character"), ... ) ## S4 method for signature 'PrestoConnection' dbDisconnect(conn)
Presto(...) ## S4 method for signature 'PrestoDriver' dbConnect( drv, catalog, schema, user, host = "localhost", port = 8080, source = methods::getPackageName(), session.timezone = "", output.timezone = "", parameters = list(), ctes = list(), request.config = httr::config(), use.trino.headers = FALSE, extra.credentials = "", bigint = c("integer", "integer64", "numeric", "character"), ... ) ## S4 method for signature 'PrestoConnection' dbDisconnect(conn)
... |
currently ignored |
drv |
A driver object generated by |
catalog |
The catalog to be used |
schema |
The schema to be used |
user |
The current user |
host |
The presto host to connect to |
port |
Port to use for the connection |
source |
Source to specify for the connection |
session.timezone |
Time zone of the Presto server. Presto returns timestamps without time zones with respect to this value. The time arithmetic (e.g. adding hours) will also be done in the given time zone. This value is passed to Presto server via the request headers. |
output.timezone |
The time zone using which TIME WITH TZ and TIMESTAMP
values in the output should be represented. Default to the Presto
server timezone (use |
parameters |
A |
ctes |
A list of common table expressions (CTEs) that can be used in the
WITH clause. See |
request.config |
An optional config list, as returned by
|
use.trino.headers |
A boolean to indicate whether Trino request headers should be used. Default to FALSE. |
extra.credentials |
Extra credentials to be passed in the X-Presto-Extra-Credential or X-Trino-Extra-Credential header ( depending on the value of the use.trino.headers argument). Default to an empty string. |
bigint |
The R type that Presto's 64-bit integer ( |
conn |
A PrestoConnection object |
Presto A PrestoDriver object
dbConnect A PrestoConnection object
dbDisconnect A logical()
value indicating success
## Not run: conn <- dbConnect(Presto(), catalog = "hive", schema = "default", user = "onur", host = "localhost", port = 8080, session.timezone = "US/Eastern", bigint = "character" ) dbListTables(conn, "%_iris") dbDisconnect(conn) ## End(Not run)
## Not run: conn <- dbConnect(Presto(), catalog = "hive", schema = "default", user = "onur", host = "localhost", port = 8080, session.timezone = "US/Eastern", bigint = "character" ) dbListTables(conn, "%_iris") dbDisconnect(conn) ## End(Not run)
presto_default()
works similarly but returns a connection on success and
throws a testthat skip condition on failure, making it suitable for use in
tests.
RPresto examples and tests connect to a default database via
dbConnect(Presto(), ...)
. This function checks if that
database is available, and if not, displays an informative message.
presto_default(...) presto_has_default(...)
presto_default(...) presto_has_default(...)
... |
Additional arguments passed on to |
if (presto_has_default()) { db <- presto_default() print(dbListTables(db)) dbDisconnect(db) } else { message("No database connection.") }
if (presto_has_default()) { db <- presto_default() print(dbListTables(db)) dbDisconnect(db) } else { message("No database connection.") }
dbplyr SQL methods
## S3 method for class 'PrestoConnection' sql_query_save(con, sql, name, temporary = TRUE, ..., with = NULL)
## S3 method for class 'PrestoConnection' sql_query_save(con, sql, name, temporary = TRUE, ..., with = NULL)
con |
A database connection. |
sql |
a character string containing SQL statement. |
name |
The table name, passed on to
|
temporary |
If a temporary table should be created. Default to TRUE in
the |
... |
Other arguments used by individual methods. |
with |
An optional WITH clause for the CREATE TABLE statement. |
Compose query to create a simple table using a statement
sqlCreateTableAs(con, name, sql, with = NULL, ...)
sqlCreateTableAs(con, name, sql, with = NULL, ...)
con |
A database connection. |
name |
The table name, passed on to
|
sql |
a character string containing SQL statement. |
with |
An optional WITH clause for the CREATE TABLE statement. |
... |
Other arguments used by individual methods. |
Allows you to connect to an existing database through a presto connection.
src_presto( catalog = NULL, schema = NULL, user = NULL, host = NULL, port = NULL, source = NULL, session.timezone = NULL, parameters = NULL, bigint = c("integer", "integer64", "numeric", "character"), con = NULL, ... )
src_presto( catalog = NULL, schema = NULL, user = NULL, host = NULL, port = NULL, source = NULL, session.timezone = NULL, parameters = NULL, bigint = c("integer", "integer64", "numeric", "character"), con = NULL, ... )
catalog |
Catalog to use in the connection |
schema |
Schema to use in the connection |
user |
User name to use in the connection |
host |
Host name to connect to the database |
port |
Port number to use with the host name |
source |
Source to specify for the connection |
session.timezone |
Time zone for the connection |
parameters |
Additional parameters to pass to the connection |
bigint |
The R type that Presto's 64-bit integer ( |
con |
An object that inherits from PrestoConnection, typically generated by DBI::dbConnect. When a valid connection object is supplied, Other arguments are ignored. |
... |
For |
## Not run: # To connect to a database my_db <- src_presto( catalog = "memory", schema = "default", user = Sys.getenv("USER"), host = "http://localhost", port = 8080, session.timezone = "Asia/Kathmandu" ) # Use a PrestoConnection my_con <- DBI::dbConnect( catalog = "memory", schema = "default", user = Sys.getenv("USER"), host = "http://localhost", port = 8080, session.timezone = "Asia/Kathmandu" ) my_db2 <- src_presto(con = my_con) ## End(Not run)
## Not run: # To connect to a database my_db <- src_presto( catalog = "memory", schema = "default", user = Sys.getenv("USER"), host = "http://localhost", port = 8080, session.timezone = "Asia/Kathmandu" ) # Use a PrestoConnection my_con <- DBI::dbConnect( catalog = "memory", schema = "default", user = Sys.getenv("USER"), host = "http://localhost", port = 8080, session.timezone = "Asia/Kathmandu" ) my_db2 <- src_presto(con = my_con) ## End(Not run)