Csv reader错误地解释了引号。

huangapple go评论69阅读模式
英文:

Csv reader misinterprets quotes

问题

当我尝试使用CSV阅读器读取我的字符串时,我得到的输出将JSON字符串转换为:

'{filter":"freeinternet"', 'region:"178307"}"'

它需要保持为:

"{\"filter\":\"freeinternet\",\"region\":\"178307\"}"

这是我尝试过的内容。我甚至尝试添加了quotechar、escapechar,并尝试不同的版本,但结果是不正确的。

import csv
from io import StringIO

s = u"""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""

f = StringIO(s)

reader = csv.reader(f, delimiter=',', escapechar = '\\')
xd =  [row for row in reader]

希望这有所帮助。

英文:

When I try reading my string with csv reader the output I get converts the json string to
'{filter":"freeinternet"', 'region:"178307"}"' which needs to stay

"{\"filter\":\"freeinternet\",\"region\":\"178307\"}"

This is what I've tried. I've even tried adding quotechar, escapechar, and trying different versions, but it results in incorrect results

import csv
from io import StringIO

s = u"""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""



f = StringIO(s)

reader = csv.reader(f, delimiter=',', escapechar = '\\')
xd =  [row for row in reader]

Any help is appreciated

答案1

得分: 1

是的,这是因为Python中的 csv.reader 将反斜杠字符 \ 视为转义字符,而您的CSV数据中的JSON字符串包含需要保留的反斜杠。

请查看以下代码:

import csv
import json
from io import StringIO

s = u"""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23""""

f = StringIO(s)

reader = csv.DictReader(f)
rows = [row for row in reader]

print(json.dumps(rows, indent=2))
英文:

yes,it is because the csv.reader in Python treats the backslash character \ as an escape character, and the JSON string in your CSV data has backslashes that need to be preserved

check this out :

import csv
import json
from io import StringIO

s = u"""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""

f = StringIO(s)

reader = csv.DictReader(f)
rows = [row for row in reader]

print(json.dumps(rows, indent=2))

答案2

得分: 1

以下是您要翻译的内容:

您可以做的是,操纵输入字符串以在文件和字典数据中使用不同的引号字符。

例如,将'用作输入的引号字符,并保留"作为JSON/字典值的引号字符。

import csv
from pprint import pprint
from io import StringIO

s = """url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""


f = StringIO(s.replace(',"{', ',\'{').replace('}",', '}\','))

reader = csv.reader(f, delimiter=',', quotechar = '\'')
pprint([row for row in reader])

这一行s.replace(',"{', ',\'{').replace('}",', '}\',')"引号字符替换为',然后我们在csv.reader的参数中使用它。

注意:只要没有嵌套字典,它就会起作用。

英文:

What you can do is, manipulate the input string to have different quote characters for the file and for dictionary data.

For example, use ' as a quote character for input and keep " as a quote character for your JSON/dictionary values.

import csv
from pprint import pprint
from io import StringIO

s = """url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""


f = StringIO(s.replace(',"{', ',\'{').replace('}",', '}\','))

reader = csv.reader(f, delimiter=',', quotechar = '\'')
pprint([row for row in reader])

The line s.replace(',"{', ',\'{').replace('}",', '}\',') replaces " quote character with ' and then we are using that in csv.reader's argument.

Note: It'll work as long as there are no nested dictionaries.

答案3

得分: 0

以下是代码的翻译部分:

import csv
from io import StringIO

s = u""""""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{"filter":"freeinternet","region":"178307"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23""""

f = StringIO(s)

reader = csv.reader(f, delimiter=',', quotechar='"', escapechar='\\')
xd =  [row for row in reader]
print(xd)
英文:

Try this:

import csv
from io import StringIO

s = u"""url,forgeresponsetype,identifiers,metadata,partitionkey,sortkey,expirationdate,lastmodifieddate,redirectkey,siteid,locale,type,update_date
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23
https://www.expedia.com/Seattle-Hotels.d178307.Travel-Guide-Hotels,MOVED_PERMANENTLY,"{\"filter\":\"freeinternet\",\"region\":\"178307\"}",,HOTEL_DESTINATION_THEME.494647,1.en_US.filter:freeinternet.region:178307,1696746399,2023-06-18T06:26:40.521Z,,1,en_US,HOTEL_DESTINATION_THEME,21/06/23"""

f = StringIO(s)

reader = csv.reader(f, delimiter=',', quotechar='\'', escapechar='\'')
xd =  [row for row in reader]
print(xd)

huangapple
  • 本文由 发表于 2023年6月26日 21:19:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76557085.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定