英文:
Escape characters in .csv for Azure Synapse seems to dissapear?
问题
以下是您提供的文本的翻译部分:
我有一个.csv 文件,看起来像这样:
"ID", "Name", "额外信息"
"1", "约翰", "{\"事件\": \"点击\", \"按钮名称\": \"接受\"}"
"2", "亚当", "{\"事件\": \"点击\", \"按钮名称\": \"接受\"}"
我尝试使用 Synapse 中的以下代码加载此文件:
SELECT
TOP 2 *
FROM
OPENROWSET(
BULK 'https://[我的存储帐户].dfs.core.windows.net/[文件路径]/[...]/*.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0'
)
AS [result]
期望得到以下结果:
| ID | Name | 额外信息 |
| -------- | -------------- |-------------- |
| 1| 约翰| {"事件": "点击", "按钮名称": "接受"}|
| 2| 亚当| {"事件": "点击", "按钮名称": "接受"}|
但我一直收到以下错误:
处理外部文件错误:'不期望的标记 'Event' 在 [字节: XXX]。
期望标记 ',', ' ' 或 '"'。'.
文件/外部表名称:'https://[我的存储帐户].dfs.core.windows.net/[文件路径]/[...]/[特定文件名称].csv'。
看起来它忽略了额外信息列中的第一个引号(")和转义字符?导致它认为 \Event\ 是某种特殊标记?
我只是不明白为什么或我可以做什么来解决这个问题?
英文:
I have .csv file that looks like this:
"ID", "Name", "Extra Info"
"1", "John", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
"2", "Adam", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
I'm trying to load this file using this code in Synapse:
SELECT
TOP 2 *
FROM
OPENROWSET(
BULK 'https://[MY STORAGE ACCOUNT].dfs.core.windows.net/[FILE PATH]/[...]/*.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0'
)
AS [result]
Expecting this result:
ID | Name | Extra Info |
---|---|---|
1 | John | {"Event": "Click", "Button Name": "Accept"} |
2 | Adam | {"Event": "Click", "Button Name": "Accept"} |
But I keep getting this error:
Error handling external file: 'Unexpected token 'Event\' at [byte: XXX].
Expecting tokens ',', ' ', or '"'. '.
File/External table name: 'https://[MY STORAGE ACCOUNT].dfs.core.windows.net/[FILE PATH]/[...]/[SPECIFIC FILE NAME].csv'.
It looks like it's ignoring the first quote (") and Escape character in the Extra Info column? Leading to it think that \Event\ is some special token?
I just don't understand why or what I can do to fix this?
答案1
得分: 0
根据这篇帖子和一些Azure文档,我认为我找到了答案:
- Field Quote 的工作原理:https://stackoverflow.com/questions/73404603/is-my-understanding-of-how-fieldquote-works-correct
- 在Azure文档中转义引号:https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file#escape-quoting-characters
似乎唯一有效的转义引号方式是使用双引号。
这意味着我的 .csv 应该按以下格式进行格式化:
"ID", "Name", "Extra Info"
"1", "John", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
"2", "Adam", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
而不是原始格式(使用 ):
"ID", "Name", "Extra Info"
"1", "John", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
"2", "Adam", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
不幸的是,除了批量编辑所有我的 .csv 文件之外,我看不到其他解决方法...
英文:
I think I found the answer based on this post and some of the Azure documentation:
- How Field Quote works: https://stackoverflow.com/questions/73404603/is-my-understanding-of-how-fieldquote-works-correct
- Escaping Quotes in the Azure Documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file#escape-quoting-characters
It seems that the only valid way to escape Quotes is by using double quotes.
This means my .csv should be formatted like this:
"ID", "Name", "Extra Info"
"1", "John", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
"2", "Adam", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
Instead of the original (which uses ):
"ID", "Name", "Extra Info"
"1", "John", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
"2", "Adam", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
Unfortunately I don't see a way around this other than BULK editing all my .csv files...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论