英文:
How to copy/join few columns from a csv file to an existing SQL table in R?
问题
我在SQL中创建了一个名为"SQL_table"的表,并使用下面提供的代码将各种现有的.csv文件内容复制到R中的"SQL_table"中。
代码:
library(RPostgres)
library(DBI)
# 步骤2:创建包含所有文件的数据框
file_names1 <- dir("D:/Data/", full.names = TRUE, recursive = T) # 指定文件所在路径
my_data_frame <- do.call(rbind, lapply(file_names1, read.csv))
# 步骤3:使用RPostgres建立R和PostgreSQL连接
dsn_database = "...." # 指定您的数据库名称
dsn_hostname = "localhost" # 指定主机名
dsn_port = "...." # 指定端口号,例如98939
dsn_uid = "...." # 指定用户名,例如"admin"
dsn_pwd = "...." # 指定密码,例如"xxx"
tryCatch({
drv <- dbDriver("Postgres")
print("连接到数据库中...")
connec <- dbConnect(drv,
dbname = dsn_database,
host = dsn_hostname,
port = dsn_port,
user = dsn_uid,
password = dsn_pwd)
print("数据库已连接!")
},
error=function(cond) {
print("无法连接到数据库。")
})
# 步骤4:使用RPostgres运行
dbWriteTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbAppendTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbReadTable(connec, "SQL_table")
我的SQL_table的结构如下:
Y_m_d_Time Location_ID Popul Literacy Mortality
2001-01-01 00:00:00 NAM 5000 77 8.8
2002-01-01 00:00:00 NAM 8700 58 7.7
2003-01-01 00:00:00 NAM 3410 98 9.8
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8
1991-01-01 00:00:00 KEN 3200 88 9.7
1991-01-01 00:00:00 KEN 4910 78 8.8
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8
2011-01-01 00:00:00 PUB 4200 89 9.7
2012-01-01 00:00:00 PUB 5910 88 8.8
..
..
..
我有另一个.csv文件"CSV_1",我想要从"CSV_1"复制一些列(纬度,经度,高度,起始日期,结束日期)到"SQL_table"中。两个表之间的公共列是"SQL_table"中的Location_ID和"CSV_1"中的City。
CSV文件的结构如下:
City Latitude Longitude Altitude Start_Date End_Date No. of Events Event_Type
NAM 35 79 218 3/1/2001 10 Flood
KEN 30 81 129 2/1/1990 5/31/1999 5 Earthquake
PUB 22 76 220 1/1/2010 11 Landslide
..
..
我的期望输出如下:
Y_m_d_Time Location_ID Popul Literacy Mortality Latitude Longitude Altitude Start_Date End_Date
2001-01-01 00:00:00 NAM 5000 77 8.8 35 79 218 3/1/2001
2002-01-01 00:00:00 NAM 8700 58 7.7 35 79 218 3/1/2001
2003-01-01 00:00:00 NAM 3410 98 9.8 35 79 218 3/1/2001
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 3200 88 9.7 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 4910 78 8.8 30 81 129 2/1/1990 5/31/1999
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8 22 76 220 1/1/2010
2011-01-01 00:00:00 PUB 4200 89 9.7 22 76 220 1/1/2010
2012-01-01 00:00:00 PUB 5910 88 8.8 22 76 220 1/1/2010
..
..
..
请问是否有人可以帮助我扩展R中的代码以获得所需的结果?
英文:
I have a created a table "SQL_table" in SQL and copied contents from various existing .csv files to "SQL_table" in R using the code given below.
Code:
library(RPostgres)
library(DBI)
#Step 2: make a dataframe of all files
file_names1 <- dir("D:/Data/", full.names = TRUE, recursive = T) #where you have your files
my_data_frame <- do.call(rbind,lapply(file_names1,read.csv))
#Step 3: Establish R & PostgreSQL Connection using RPostgres
dsn_database = "...." # Specify the name of your Database
dsn_hostname = "localhost" # Specify host name
dsn_port = "...." # Specify your port number. e.g. 98939
dsn_uid = "...." # Specify your username. e.g. "admin"
dsn_pwd = "...." # Specify your password. e.g. "xxx"
tryCatch({
drv <- dbDriver("Postgres")
print("Connecting to Database…")
connec <- dbConnect(drv,
dbname = dsn_database,
host = dsn_hostname,
port = dsn_port,
user = dsn_uid,
password = dsn_pwd)
print("Database Connected!")
},
error=function(cond) {
print("Unable to connect to Database.")
})
#Step 4: Run using RPostgres
dbWriteTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbAppendTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbReadTable(connec, "SQL_table")
The structure of my SQL_table is as follows:
Y_m_d_Time Location_ID Popul Literacy Mortality
2001-01-01 00:00:00 NAM 5000 77 8.8
2002-01-01 00:00:00 NAM 8700 58 7.7
2003-01-01 00:00:00 NAM 3410 98 9.8
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8
1991-01-01 00:00:00 KEN 3200 88 9.7
1991-01-01 00:00:00 KEN 4910 78 8.8
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8
2011-01-01 00:00:00 PUB 4200 89 9.7
2012-01-01 00:00:00 PUB 5910 88 8.8
..
..
..
I have another .csv file "CSV_1" and I want to copy few columns (Latitude, Longitude, Altitude, Start_Date, End_Date) from "CSV_1" to "SQL_table" using R. The common column between both the tables is Location_ID in "SQL_table" and City in "CSV_1".
The structure of CSV file is as follows:
City Latitude Longitude Altitude Start_Date End_Date No. of Events Event_Type
NAM 35 79 218 3/1/2001 10 Flood
KEN 30 81 129 2/1/1990 5/31/1999 5 Earthquake
PUB 22 76 220 1/1/2010 11 Landslide
..
..
My desired output is as follows:
Y_m_d_Time Location_ID Popul Literacy Mortality Latitude Longitude Altitude Start_Date End_Date
2001-01-01 00:00:00 NAM 5000 77 8.8 35 79 218 3/1/2001
2002-01-01 00:00:00 NAM 8700 58 7.7 35 79 218 3/1/2001
2003-01-01 00:00:00 NAM 3410 98 9.8 35 79 218 3/1/2001
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 3200 88 9.7 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 4910 78 8.8 30 81 129 2/1/1990 5/31/1999
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8 22 76 220 1/1/2010
2011-01-01 00:00:00 PUB 4200 89 9.7 22 76 220 1/1/2010
2012-01-01 00:00:00 PUB 5910 88 8.8 22 76 220 1/1/2010
..
..
..
Could anyone please help me how I can extend my code in R to obtain the desired results.
答案1
得分: 1
假设您的变量Location_ID
来自flights
表的origin
,City
来自airports
表的faa
。如果您同意使用{dplyr},您可以简单地执行left_join()
。
con <- DBI::dbConnect(duckdb::duckdb())
dbplyr::copy_nycflights13(con)
#> 创建表格: airlines
#> 创建表格: airports
#> 创建表格: flights
#> 创建表格: planes
#> 创建表格: weather
flights <- dplyr::tbl(con, "flights")
airports <- dplyr::tbl(con, "airports")
flights |>
dplyr::left_join(airports, by = c("origin" = "faa"))
#> # 源: SQL [?? x 26]
#> # 数据库: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.1/:memory:]
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 2 14 1045 1056 -11 1204 1230
#> 2 2013 2 14 1048 1049 -1 1412 1400
#> 3 2013 2 14 1048 1040 8 1333 1337
#> 4 2013 2 14 1051 1100 -9 1424 1424
#> 5 2013 2 14 1051 1100 -9 1203 1214
#> 6 2013 2 14 1057 1100 -3 1408 1420
#> 7 2013 2 14 1057 1100 -3 1244 1300
#> 8 2013 2 14 1057 1059 -2 1353 1441
#> 9 2013 2 14 1058 1056 2 1211 1220
#> 10 2013 2 14 1058 1100 -2 1337 1338
#> # 更多行
#> # 18个更多的变量: arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, name <chr>, lat <dbl>,
#> # lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>
DBI::dbDisconnect(con, shutdown = TRUE)
请注意,上述代码是用R语言编写的,它使用{dplyr}和数据库连接来执行左连接操作。如果您有任何其他问题或需要进一步的帮助,请随时提出。
英文:
Suppose your variable Location_ID
is origin
from the flights
table and City
is faa
from the airports
table. If you agreed to use {dplyr}, you could simply do a left_join()
.
con <- DBI::dbConnect(duckdb::duckdb())
dbplyr::copy_nycflights13(con)
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather
flights <- dplyr::tbl(con, "flights")
airports <- dplyr::tbl(con, "airports")
flights |>
dplyr::left_join(airports, by = c("origin" = "faa"))
#> # Source: SQL [?? x 26]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.1/:memory:]
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 2 14 1045 1056 -11 1204 1230
#> 2 2013 2 14 1048 1049 -1 1412 1400
#> 3 2013 2 14 1048 1040 8 1333 1337
#> 4 2013 2 14 1051 1100 -9 1424 1424
#> 5 2013 2 14 1051 1100 -9 1203 1214
#> 6 2013 2 14 1057 1100 -3 1408 1420
#> 7 2013 2 14 1057 1100 -3 1244 1300
#> 8 2013 2 14 1057 1059 -2 1353 1441
#> 9 2013 2 14 1058 1056 2 1211 1220
#> 10 2013 2 14 1058 1100 -2 1337 1338
#> # ℹ more rows
#> # ℹ 18 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, name <chr>, lat <dbl>,
#> # lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>
DBI::dbDisconnect(con, shutdown = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论