英文:
Import .csv into local SQL server with pre-made table using Python
问题
我正在制作(或试图制作)一个小脚本,用于编译两个 .csv 文件,去重后上传到我的本地 MS SQL 服务器。我已经创建了表格,并希望保持这样,因为我需要能够通过 SQL 进行服务器调整,这样做要容易得多。我对Python还很陌生,对SQL了解不多,但迄今为止,这个网站已经帮助了我很多。
我已经搞定了 .csv 的输出和连接到数据库的部分(我认为是这样),但是我无法让它提取我的 .csv 并将其输入到数据库中。我尝试了很多这里建议的不同方法,但我认为我是一个新手,无法识别自己在做什么,所以很难确定。我需要将数据库的列放在 SQL 服务器的相同列中。列 'side' 在 SQL 中是主键。所有其他值都不能为空。
根据我尝试的内容,我要么不会出现错误,但没有输入,要么会出现某种错误。使用下面的当前代码,我会被驳回,表示我的所有列都是无效的。
有几次我能够输入不正确的值,或者只是一行数据。那是使用 BULK INSERT 命令完成的,如果我添加更多行,它就停止工作了,所以我采取了不同的方法。
请忽略我导入的多余模块。一旦我弄清楚了这个问题,并使我的代码正常工作,我会删除我不需要的模块。
我尝试过 pymssql,但在将其正确链接到数据库时遇到了问题,我也尝试了只使用 INSERT INTO 的简单游标命令,但也没有成功。
我尝试过这个问题的解决方案:
https://stackoverflow.com/questions/39899088/import-csv-file-into-sql-server-using-python
还有这里:
https://stackoverflow.com/questions/70219015/pandas-csv-to-sql-database
但其中许多都很困难,因为它们包括在 Python 中创建数据库,而不是 SQL 服务器。最终,我想将其编译成一个小的 .exe,供我的同事使用,我宁愿只更改 SQL 服务器,而不是在表格发生变化时每次都重新分发新的应用程序版本。输入的值将保持不变,但计算将会改变,不同的列(围绕这些列)将会增加和减少。
这是一个小的示例代码片段,只显示不正常工作的部分。
from six.moves import urllib
import csv
import pymssql
import d6tstack
# 创建引擎
params = urllib.parse.quote_plus("Driver={ODBC Driver 18 for SQL Server};"
"Server=SERVER;"
"Database=AppDev;"
"Trusted_Connection=yes;"
"TrustServerCertificate=yes"
)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
# 连接到 SQL 数据库
cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};"
"Server=SERVER;"
"Database=AppDev;"
"Trusted_Connection=yes;"
"TrustServerCertificate=yes"
)
cursor = cnxn.cursor()
# 导入文件(进行中)
df = pd.read_csv(r"C:PATHFullOutput.csv", index_col=False)
df.to_sql("Wolverine",engine, if_exists='append',chunksize=300,dtype={'Serial': String(22)})
cnxn.commit()
# 关闭连接
cnxn.close()
以下是我得到的错误:
('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207) (SQLExecDirectW); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
[SQL: INSERT INTO [Wolverine] ([index], [Serial]) VALUES (?, ?), (?, ?)]
[parameters: (0, 'FPNLCB804MRK0223179054', 1, 'FPNLCB804MRK0223179054')]
在我的创建 .csv 的代码中,我明确设置了不创建索引,并且在输出 .csv 中没有可见的索引,所以出现名为 'Index' 的无效列是一个问题。
请告诉我我哪里错了,我确信不止一次,哈哈。谢谢!
英文:
I'm making (or trying to) make a small script that compiles two .csvs, drops duplicates, then uploads itself to my local MS SQL server. I already have the table made, and I'd prefer it that way, as I need to be able to tweak the server and it's much, much easier doing that through SQL. I'm still very new to Python and don't know too much more about SQL but this site has been supremely helpful so far.
I have the .csv output and connecting to the database down (I think) but I can't get it to pull my .csv and input it into the database. I've tried a ton of different ways suggested here, but I think I'm too much of a novice to recognize what I'm doing, so it's difficult to be certain. I need the columns of the database to be put in the same columns in the SQL server. Column 'side' is the primary key in SQL. All other values are NOT NULL.
Depending on what I try, I'll either get no errors, but no input, or I'll get some sort of error. With the current code below, I get kicked back, stating that all of my columns are invalid.
A few times I've been able to input incorrect values, or just a single line. That was with the BULK INSERT command, and if I add more lines, it stops working, so I've taken a different approach.
Please ignore the extraneous modules I've imported. Once I figure this out, and get my code working properly, I'll remove the ones I don't need.
I've tried pymssql but ran into an issue linking it properly to the database, I've tried just a simple cursor command for INSERT INTO with no luck either.
I've tried this from this question here:
https://stackoverflow.com/questions/39899088/import-csv-file-into-sql-server-using-python
and here
https://stackoverflow.com/questions/70219015/pandas-csv-to-sql-database
but many of these are difficult as they include the creation of a database in Python and not SQL server. Eventually I want this compiled into a small .exe that my coworkers can use and I'd rather just be able to change the SQL server instead of having to redistribute a new app version every time there's a change in the table. The values inputted will stay the same, but calculations will change and different columns (surrounding these columns here) will come and go.
Here is a small snippet of example code, showing just the portion that doesn't work correctly.
from six.moves import urllib
import csv
import pymssql
import d6tstack
#Create Engine
params = urllib.parse.quote_plus("Driver={ODBC Driver 18 for SQL Server};"
"Server=SERVER;"
"Database=AppDev;"
"Trusted_Connection=yes;"
"TrustServerCertificate=yes"
)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
#Connect to SQL Database
cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};"
"Server=SERVER;"
"Database=AppDev;"
"Trusted_Connection=yes;"
"TrustServerCertificate=yes"
)
cursor = cnxn.cursor()
#######################################################Importing Files(WIP)################################################################################
df = pd.read_csv(r"C:PATHFullOutput.csv", index_col=False)
df.to_sql("Wolverine",engine, if_exists='append',chunksize=300,dtype={'Serial': String(22),
}
)
cnxn.commit()
#closes connection
cnxn.close()
Here's the error I'm getting:
('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207) (SQLExecDirectW); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
[SQL: INSERT INTO [Wolverine] ([index], [Serial]) VALUES (?, ?), (?, ?)]
[parameters: (0, 'FPNLCB804MRK0223179054', 1, 'FPNLCB804MRK0223179054')]
In my code to create the .csv I specifically have it set to not create an index, and there is not visible index in the output .csv, so the fact that there is an invalid column titled 'Index' is an issue as well.
Please let me know where I've gone wrong, I'm sure it's more than once, haha. Thank you!
答案1
得分: -1
我已经弄清楚了。使用Pandas/SQL Alchemy,你无法使数据类型与MS SQL Server匹配。对我而言有效的方法是删除表,然后只运行Python代码,创建一个新表。如果我想要添加列或主键,可以在通过SQL Alchemy创建表之后进行操作。每次都能完美运行。不要浪费时间纠结于双引号或单引号是否有问题。只需让Python创建表格,然后在SQL中按照你的喜好进行更改。
英文:
I've figured it out. With Pandas/SQL Alchemy, you can't get your datatypes to match up with MS SQL Server. What worked for me is deleting the table, and just running the python code, creating a new table. If I want to add columns or a primary key, I can do so after the table is created through SQL Alchemy. Works like a charm every time. Don't waste time fooling around, wondering if your double or single quotes are the issue. Just get Python to make the table and change it in SQL to your liking afterward.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论