英文:
Error while delete all the rows and keep a selected rows in SQL table
问题
I have a SQL file and I want to remove all the rows and just select one based on a condition.
My SQL look like this:
dput(brain_cortex_gtex_n[1:5,c(1,2,6)])
structure(list(gene = c("ENSG00000130943.6", "ENSG00000130943.6",
"ENSG00000130943.6", "ENSG00000130943.6", "ENSG00000130943.6"
), rsid = c("rs6007530", "rs7364174", "rs6007567", "rs112187135",
"rs144824037"), weight = c(0.0671656140385751, 0.065292861996415,
0.0088563732779068, -0.3156608259595, 0.0128894490022945)), row.names = 8154:8158, class = "data.frame")
I want to select only the row in which rsid is equal to "rs3087501".
The code that I wrote is:
x <- which(gtex_n$rsid %in% "rs3087501") ##x is 8222
#copy original file db to another
#use that for modifications:
library("RSQLite")
sqlite <- dbDriver("SQLite")
dbname <- "gtex_v7_models_filtered_signif.db"
db = dbConnect(sqlite,dbname)
sqlite <- dbDriver("SQLite")
myfile <- paste0("gtex_filtered_brain_cortex_modelsn","_","rs3087501",".db")
dbname1 <- myfile
db1 = dbConnect(sqlite,dbname1)
sqliteCopyDatabase(db, db1)
dbDisconnect(db)
brain_cortex_gtex_n_1 <- dbGetQuery(db1,"SELECT * FROM weights") ##dim 9453
#dbExecute(db1,"DELETE FROM weights WHERE rsid=rsid[i]")
dbExecute(db1,"SELECT * FROM weights WHERE rsid='rs3087501'")
dbDisconnect(db1)
But I get an error:
Error: unexpected symbol in "dbExecute(db1,"SELECT * FROM weights WHERE rsid='rs3087501'"
Does anyone know how to solve this or provide any guidance for it.
Thank you.
英文:
I have a SQL file and I want to remove all the rows and just select one based on a condition.
My SQL look like this:
dput(brain_cortex_gtex_n[1:5,c(1,2,6)])
structure(list(gene = c("ENSG00000130943.6", "ENSG00000130943.6",
"ENSG00000130943.6", "ENSG00000130943.6", "ENSG00000130943.6"
), rsid = c("rs6007530", "rs7364174", "rs6007567", "rs112187135",
"rs144824037"), weight = c(0.0671656140385751, 0.065292861996415,
0.0088563732779068, -0.3156608259595, 0.0128894490022945)), row.names = 8154:8158, class = "data.frame")
I want to select only the row in which rsid is equal to "rs3087501"
The code that I wrote is:
x <- which(gtex_n$rsid %in% "rs3087501") ##x is 8222
#copy original file db to another
#use that for modifications:
library("RSQLite")
sqlite <- dbDriver("SQLite")
dbname <- "gtex_v7_models_filtered_signif.db"
db = dbConnect(sqlite,dbname)
sqlite <- dbDriver("SQLite")
myfile <- paste0("gtex_filtered_brain_cortex_modelsn","_","rs3087501",".db")
dbname1 <- myfile
db1 = dbConnect(sqlite,dbname1)
sqliteCopyDatabase(db, db1)
dbDisconnect(db)
brain_cortex_gtex_n_1 <- dbGetQuery(db1,"SELECT * FROM weights") ##dim 9453
#dbExecute(db1,"DELETE FROM weights WHERE rsid=rsid[i] ")
dbExecute(db1,"SELECT * FROM weights WHERE rsid="rs3087501")
dbDisconnect(db1)
But I get an error:
> Error: unexpected symbol in "dbExecute(db1,"SELECT * FROM weights WHERE rsid="rs3087501"
Does anyone know how to solve this or provide any guidance for it.
Thank you.
答案1
得分: 1
-
由于引号不匹配,出现了错误:所有的双引号必须匹配,如果你需要嵌套的双引号,那么有两种选择:要么从单引号开始(因为在R中,
'hello'
和"hello"
是完全相同的),然后在其中使用双引号;要么使用反斜杠双引号。例如,'"hello" world' # [1] "\"hello\" world" "\"hello\" world" # [1] "\"hello\" world" identical('"hello" world', "\"hello\" world") # [1] TRUE
这意味着你可以将你的字符串转换为以下其中一种(尽管它们仍然不正确,请继续阅读):
dbExecute(db1,'SELECT * FROM weights WHERE rsid="rs3087501"') dbExecute(db1,"SELECT * FROM weights WHERE rsid=\"rs3087501\"")
-
SQL不将单引号和双引号视为同义词,通常双引号用于_标识符_(表名、列名),而单引号用于_字符串文字_。所以,应该使用(尽管仍然不完全正确):
dbExecute(db1,"SELECT * FROM weights WHERE rsid='rs3087501'")
-
如果你确实是想要
SELECT
,那么你应该使用dbGetQuery
,就像你之前做的那样。我怀疑你在对dbExecute
的调用中意思是使用DELETE
,那么它会转化成(仍然不是理想的答案,但接近了):dbExecute(db1,"DELETE FROM weights WHERE rsid='rs3087501'")
-
直接将“数据”粘贴或格式化到SQL查询中通常是不鼓励的,有很多原因。你可能要考虑使用参数化查询来代替手动构建查询字符串。除了对恶意的 SQL注入(例如,XKCD的 Exploits of a Mom,即“Little Bobby Tables”)的安全问题之外,还有关于格式不正确的字符串或Unicode与ANSI的错误的问题,即使只有一个数据分析师运行查询也是一个问题。
DBI
支持 参数化查询。这意味着你可以这样做:
dbExecute(db1,"DELETE FROM weights WHERE rsid=?", params = list("rs3087501"))
请注意,在这种情况下,我们_不_在
?
标记周围放置引号,而在list(.)
内使用的引号仅用于R,因此无论你使用单引号还是双引号都无所谓。 -
你说“保留这个特定的rsid并删除所有不匹配的”,这意味着删除其他所有的记录...这意味着以下两种方法中的一种(不需要同时使用)都可以工作:
dbExecute(db1,"DELETE FROM weights WHERE not rsid=?", params = list("rs3087501")) dbExecute(db1,"DELETE FROM weights WHERE rsid <> ?", params = list("rs3087501"))
完全可重现的示例:
sqlite <- DBI::dbConnect(RSQLite::SQLite())
DBI::dbWriteTable(sqlite, "mt", head(mtcars))
DBI::dbGetQuery(sqlite, "select * from mt")
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
DBI::dbExecute(sqlite, "delete from mt where not cyl=?", params=list(6))
# [1] 2
DBI::dbGetQuery(sqlite, "select * from mt")
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 21.4 6 258 110 3.08 3.215 19.44 1 0 3
<details>
<summary>英文:</summary>
**Note**: this answer builds and corrects one piece at a time, not all interim code blocks are completely correct. Go to the bottom for the final recommend(s).
1. The error is because you have mismatched quotes: all double-quotes must be matched, and if you need embedded double-quotes, then you have two options: start with single quotes (since in R, `'hello'` and `"hello"` are perfectly identical) and use double quotes within; or use backslashed double quotes. For instance,
```r
'"hello" world'
# [1] "\"hello\" world"
"\"hello\" world"
# [1] "\"hello\" world"
identical('"hello" world', "\"hello\" world")
# [1] TRUE
```
This means you could convert your string into one of these (though they are still wrong, keep reading):
```r
dbExecute(db1,'SELECT * FROM weights WHERE rsid="rs3087501"')
dbExecute(db1,"SELECT * FROM weights WHERE rsid=\"rs3087501\"")
```
2. SQL does not treat single-quotes and double-quotes synonymously, typically double-quotes are for _identifiers_ (tables, column names) and single-quotes are for _string literals_. Instead, use (though still not completely correct):
```r
dbExecute(db1,"SELECT * FROM weights WHERE rsid='rs3087501'")
```
3. If you really did mean `SELECT`, then you should be using `dbGetQuery` as you did earlier. I suspect you meant to use `DELETE` in that call to `dbExecute`, which would then translate into (still not the ideal answer, though getting closer):
```r
dbExecute(db1,"DELETE FROM weights WHERE rsid='rs3087501'")
```
4. It is generally discouraged to `paste` or `sprintf` "data" things directly into a SQL query, for many reasons. You may want to consider parameterized queries vice constructing query strings manually. In addition to security concerns about *malicious* [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) (e.g., XKCD's [*Exploits of a Mom*](https://xkcd.com/327/) aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's one data analyst running the query. `DBI` supports [parameterized queries](https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries).
This means you can do:
```r
dbExecute(db1,"DELETE FROM weights WHERE rsid=?", params = list("rs3087501"))
```
Note that in this case, we do _not_ put quotes around the `?` marker, and the quotes used within the `list(.)` are solely for R, so it does not matter if you use single-quotes or double-quotes.
5. You say *"keep this particular rsid and remove all the unmatched one"*, which suggests deleting all others ... which means either (no need for both) of the following will work:
```r
dbExecute(db1,"DELETE FROM weights WHERE not rsid=?", params = list("rs3087501"))
dbExecute(db1,"DELETE FROM weights WHERE rsid <> ?", params = list("rs3087501"))
```
Fully reproducible example:
```r
sqlite <- DBI::dbConnect(RSQLite::SQLite())
DBI::dbWriteTable(sqlite, "mt", head(mtcars))
DBI::dbGetQuery(sqlite, "select * from mt")
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
DBI::dbExecute(sqlite, "delete from mt where not cyl=?", params=list(6))
# [1] 2
DBI::dbGetQuery(sqlite, "select * from mt")
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 4 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
答案2
得分: 0
只是注意,你的示例中缺少要过滤的 rsid。如果它在那里,这种方法将适用于在 SQL 查询之后过滤数据。对于这个示例,我替换了第一个 rsid 为指定的 rsid。
df <- structure(list(gene = c("ENSG00000130943.6", "ENSG00000130943.6",
"ENSG00000130943.6", "ENSG00000130943.6", "ENSG00000130943.6"
), rsid = c("rs3087501", "rs7364174", "rs6007567", "rs112187135",
"rs144824037"), weight = c(0.0671656140385751, 0.065292861996415,
0.0088563732779068, -0.3156608259595, 0.0128894490022945)), row.names = 8154:8158, class = "data.frame")
output <- df %>%
filter(rsid == "rs3087501")
请注意,代码部分没有翻译。
英文:
Just note your example is missing the rsid you want to filter for. If it was in there, this approach will work for filtering the data after the SQL query. For this example I replaced the first rsid with the one specified.
df <- structure(list(gene = c("ENSG00000130943.6", "ENSG00000130943.6",
"ENSG00000130943.6", "ENSG00000130943.6", "ENSG00000130943.6"
), rsid = c("rs3087501", "rs7364174", "rs6007567", "rs112187135",
"rs144824037"), weight = c(0.0671656140385751, 0.065292861996415,
0.0088563732779068, -0.3156608259595, 0.0128894490022945)), row.names = 8154:8158, class = "data.frame")
output <- df %>%
filter(rsid == "rs3087501")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论