英文:
R DBI::dbGetQuery where clause interprets string as a column name
问题
RStudio,使用DBI和odbc包,连接到Amazon Redshift。我需要从离职表中获取退休员工的记录。如果我使用以下形式的where子句:
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = 12345')
这可以正常工作。但如果我将where子句更改为 where "leaving reason" = Retirement
或 where "leaving reason" = "Retirement"
,我会收到错误消息 [SQLState 42703] ERROR: column "retirement" does not exist。我还尝试过使用having
代替where
,但结果相同。
(附带一提:如果我交换单引号和引号,即引号在外面,单引号在内部('employee number'),那么即使在第一个示例中也不起作用。)
我可以将整个表导入,然后在R中过滤掉非退休记录,但这样做是为了几千条记录而导入了300万条记录。我对SQL了解甚少,如果这是一些微不足道的事情,还请原谅,但有人能帮助吗?
英文:
RStudio, packages DBI and odbc, connection to Amazon Redshift. I need the records of retirees from a leavers table. If I use a where clause like this:
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = 12345')
This works. But if I swap the where clause to where "leaving reason" = Retirement'
or where "leaving reason" = "Retirement"'
, I get the error message [SQLState 42703] ERROR: column "retirement" does not exist. I also tried having
instead of where
, but I get the same.
(As an aside: if I swap the single quotes and speech marks, i.e. the speech marks are on the outside and single quotes on the inside ('employee number'), then it does not work even with the first example.)
I can pull in the whole table and then filter out the records other than Retirement in R, but I am pulling in 3 million records for the sake of a couple of thousand. I know very little about SQL, so apologies if this is something trivial, but could someone help, please?
答案1
得分: 2
Quoted identifiers and quoted string literals are different between R and SQL.
| Lang | Identifier | String Literal |
|------|-----------------------|------------------------------------------|
| R | `Some Column` | 'Some string' or "Some string" |
| SQL | "Some Column" | 'Some string' |
在R中,我们通常仅在列名中包含空格、以数字开头或违反R的“正常名称规则”时使用反引号。我们可以始终使用它们,就像 mtcars$
cyl
(等同于 mtcars$cyl
),但当没有必要时,通常不这样做。
尝试
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = 12345
or "leaving reason" = \'Retirement\'
')
(请注意,我们需要转义引号,不是什么大问题。)
不过更好的方法是使用绑定参数。除其他事项外,它们提供了对无意中的SQL注入的安全性(稍微扩大了该术语的范围)。具体来说,
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = ?
or "leaving reason" = ?
', params = list(12345, "Retirement"))
此处使用的双引号在R领域中,而不是SQL领域,因为从R传递了一个简单的字符串。在这里,我们可以使用 "Retirement"
或 'Retirement'
,它们是相同的,因为所发生的是,此字符串以字符串的形式高效地传递到SQL,因为它被告知它是一个字符串,所以不需要花时间从SQL查询中解析它出来。(使用绑定参数还有其他优点,请参见参数化查询。)
用于绑定的参数占位符?
在不同的数据库管理系统之间会有所不同;例如,ODBC总是(我相信)使用?
,无论DBMS类型如何;postgres使用$1
(以及$2
,...);sqlite使用?
,?1
或:name
;SQL Server(本地客户端)使用?
。
英文:
Quoted identifiers and quoted string literals are different between R and SQL.
| Lang | Identifier | String Literal |
|------|---------------|--------------------------------|
| R | `Some Column` | 'Some string' or "Some string" |
| SQL | "Some Column" | 'Some string' |
In R, we typically only use backticks on column names when they contain a space, start with a number, or violate any others of R's "normal name rules". We can always use them, as in mtcars$`cyl`
(equivalent to mtcars$cyl
), but when there's no need to, it is not usually done.
Try
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = 12345
or "leaving reason" = \'Retirement\'
')
(Notice we need to escape the quotes, not a big deal.)
Better yet though is to use bound parameters. Among other things, they provide safety from inadvertent SQL Injection (a slight stretch of the term). Namely,
leavers <- dbGetQuery(con,
'select distinct
"employee number",
"leaving reason"
from leaverstable where "employee number" = ?
or "leaving reason" = ?
', params = list(12345, "Retirement"))
where the double-quotes used here is in R-land, not SQL-land, because from R we are passing a simple string. Here, we can use "Retirement"
or 'Retirement'
, they will be the same, because what happens is that this string is passed efficiently internally as a string to SQL, which will interpret it correctly as a string literal because it was told it's a string and it doesn't need to spend time parsing it out of the SQL query. (There are other advantages to using bound-parameters, see parameterized queries.)
The use of ?
as place-holders for the parameter to bind does change between DBMSes; for instance, ODBC always (I believe) uses ?
, regardless of the DBMS type; postgres uses $1
(and $2
, ...); sqlite uses ?
, ?1
, or :name
; and SQL Server (native client) uses ?
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论