Insert pandas data frame into Postgres

huangapple go评论66阅读模式
英文:

Insert pandas data frame into Postgres

问题

我有一个pandas数据框,我想将其插入到我的Django项目中的Postgres数据库中。

数据框有5列,而数据库表有6列,而且数据框的列和数据库列的顺序不同。

因此,在合并之前,我是否需要确保数据框和数据库表的列顺序相同?请建议我如何处理缺失的列。

英文:

I have a pandas data frame which I want to insert it into my Postgres database in my Django project.

The data frame has 5 columns and the Database table has 6 columns and moreover, the data frame columns and DB columns order are not the same.

So, before merging both, do I have to make sure that the order of the columns is the same in both the data frame and DB table? and how pls suggest how do I handle the missing column

答案1

得分: 9

如果__数据框具有与数据库中的列名相同的列名__,则可以使用dataframe.to_sql()方法直接将df插入到数据库表中,使用sqlalchemy进行连接:

from myapp.models import Bob
from sqlalchemy import create_engine
from django.conf import settings

db_connection_url = "postgresql://{}:{}@{}:{}/{}".format(
    settings.DATABASES['default']['USER'],
    settings.DATABASES['default']['PASSWORD'],
    settings.DATABASES['default']['HOST'],
    settings.DATABASES['default']['PORT'],
    settings.DATABASES['default']['NAME'],
)

engine = create_engine(db_connection_url)

df.to_sql(Bob._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)

缺少的列将为空(或者如果在数据库级别定义了默认值而不是Django级别,则数据库将设置默认值),或者您可以将缺少的列添加到带有所需值的数据框中。

英文:

If dataframe has column names same as column names in database, you can insert df directly into database table using dataframe.to_sql() method with the help of sqlalchemy for connection:

from myapp.models import Bob
from sqlalchemy import create_engine
from django.conf import settings

db_connection_url = "postgresql://{}:{}@{}:{}/{}".format(
    settings.DATABASES['default']['USER'],
    settings.DATABASES['default']['PASSWORD'],
    settings.DATABASES['default']['HOST'],
    settings.DATABASES['default']['PORT'],
    settings.DATABASES['default']['NAME'],
)

engine = create_engine(db_connection_url)

df.to_sql(Bob._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)

Missing column will be empty (or database will set default value if it defined at database level, not django level), or you can add missing column to dataframe with required value.

答案2

得分: 3

只需执行显式插入...

如果您的表的列顺序为A、B、C、D、E,但您的Pandas中它们的顺序是D、C、B、A(请注意没有列E),只需生成一个SQL插入语句,如下(请注意没有列E):

insert into <TABLE> (D, C, B, A) values (row_iterator.D, row_iterator.C, ...)

对于列E - 最好和最简单的解决方案是在数据库定义中设置默认值....

例如:

CREATE TABLE Bob (
    A int NOT NULL,
    B int NOT NULL,
    C int NOT NULL,
    D int NOT NULL,
    E int DEFAULT 42
);

希望能有所帮助。

英文:

Just do an Explicit Insert ...

If your table has columns in the order of A,B,C,D,E

But your Pandas has them in the order of D,C,B,A (Note no Column E)

Just generate an SQL Insert like (Note I have no Column E)

   insert into &lt;TABLE&gt; (D,C,B,A) values (row_iterator.D,row_iterator.C,...) 

For the Column E - the best and simplest solution is have a default value in the Db Definition ....

i.e.

CREATE TABLE Bob (
    A int NOT NULL,
    B int NOT NULL,
    C int NOT NULL,
    D int NOT NULL,
    E int DEFAULT 42
);

Hope that helps

huangapple
  • 本文由 发表于 2020年1月6日 21:47:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613267.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定