Python: json.dumps with ensure_ascii=False and encoding(‘utf-8’) seems to convert string to bytes.

huangapple go评论78阅读模式
英文:

Python: json.dumps with ensure_ascii=False and encoding('utf-8') seems to convert string to bytes

问题

我正在生成一个Python字典,如下所示:

placedict = {
    "id": geonames.geonames_id,
    "info": json.dumps(jsoninfo),
}

其中"id"是一个字符串,"info"是一个有效且可读的JSON字符串。

但是,正如您所看到的,虽然jsoninfo变量包含有效的UTF-8字符,但placedict['info']的字符不是UTF-8编码,而是转义字符。

因此,我尝试将json.dumps行更改为:

placedict = {
    "id": geonames.geonames_id,
    "info": json.dumps(jsoninfo).encode("utf-8"),
}

或者甚至

placedict = {
    "id": geonames.geonames_id,
    "info": json.dumps(jsoninfo, ensure_ascii=False).encode("utf-8"),
}

希望这将以所期望的方式编码JSON,但我发现在进行这些修改之后,字典的'info'成员返回为b'.........',因此在MongoDB中找到了一个二进制字符串。

我想在MongoDB中存储一个具有UTF-8编码可读JSON字符串的字典。

我在哪里出错了?

英文:

I am generating a Python dictionary as follows:

placedict = {
   "id": geonames.geonames_id,
   "info": json.dumps(jsoninfo),
}

where id is a string and info a valid and readable JSON string:

'{"geonamesurl": "http://geonames.org/310859/kahramanmara\\u015f.html", "searchstring": "Kahramanmara\\u015f", "place": "Kahramanmara\\u015f", "confidence": 1, "typecode": "PPLA", "toponym": "Kahramanmara\\u015f", "geoid": 310859, "continent": "AS", "country": "Turkey", "state": "Kahramanmara\\u015f", "region": "Kahramanmara\\u015f", "lat": "37.5847", "long": "36.92641", "population": 376045, "bbox": {"northeast": [37.66426194452945, 37.02690583904019], "southwest": [37.50514805547055, 36.825904160959816]}, "timezone": "Europe/Istanbul", "wikipedia": "en.wikipedia.org/wiki/Kahramanmara%C5%9F", "hyerlist": ["part-of: Earth GeoID: 6295630 GeoCode: AREA", "part-of: Asia GeoID: 6255147 GeoCode: CONT", "part-of: Turkey GeoID: 298795 GeoCode: PCLI", "part-of: Kahramanmara\\u015f GeoID: 310858 GeoCode: ADM1", "part-of: Kahramanmara\\u015f GeoID: 310859 GeoCode: PPLA"], "childlist": ["Aksu", "Barbaros", "Egemenlik"]}'

but as you can see while the jsoninfo variable holds valid utf-8 chars, the placedict['info'] chars are not utf-8 encoded but rather escaped.
I therefore tried to change the json.dumps line to:

placedict = {
            "id": geonames.geonames_id,
            "info": json.dumps(jsoninfo).encode("utf-8"),
        }

or even

placedict = {
            "id": geonames.geonames_id,
            "info": json.dumps(jsoninfo, ensure_ascii=False).encode("utf-8"),
        }

hoping this would encode the JSON as desired, but I see that after either of these modifications, the 'info" member of the dictionary returns as b'.........' and therefore find a binary string in MongoDB.

I want to store the dictionary with an utf-8 encoded readable JSON string in MongoDB.

Where am I making a mistake?

答案1

得分: 2

你可以只使用json.dumps,并设置ensure_ascii=False。

import json
jsoninfo = {"El": "Niño"}
info = json.dumps(jsoninfo, ensure_ascii=False)
print(info)  # {"El": "Niño"}
英文:

You might use just json.dumps with ensure_ascii=False

import json
jsoninfo = {"El":"Niño"}
info = json.dumps(jsoninfo, ensure_ascii=False)
print(info)  # {"El": "Niño"}

huangapple
  • 本文由 发表于 2023年2月8日 20:54:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75386117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定