可以将一个大型字典列表转换为字符串,然后在Python中再次转换为列表吗?

huangapple go评论65阅读模式
英文:

Is it possible to convert a large list of dictionaries into a string, and then back into a list in Python?

问题

我正在使用MySQL数据库作为工作项目的一部分,并为其创建了一个Python脚本。在从服务器提取数据时(在这种情况下,仅有3条记录),我使用了'pymysql'库。这个库以字典列表的形式输出数据。列表是整个输出,其中每个字典都包含每个记录。

output = [{'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 9, 'Sentence': 'Tell me a joke.'}, {'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 10, 'Sentence': '<FONT size="5"><B>\nHigh Workloads Hit Service Levels</FONT></B> \n<BR><FONT size="4"><FONT style="text-transform: capitalize;">\ncalls and average handle time</FONT> Up So Average Service 
Level Down To 92.2%.</FONT>\n\n'}, {'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 18, 'Sentence': '<BR><BR>Tell me a joke.'}]

看起来有点复杂,但你会看到有3个'Sentence'项目,每个后面都跟着一个字符串。我想要做的是将整个'output'列表转换为一个字符串,以便我可以删除一些字符。你会在第二个'Sentence'项目的字符串中看到有2个 ┬á 字符在它的开头。我的想法是使用 changed_output.replace('┬á', ' ', 2) 来替换这些字符,但我需要先将它转换为字符串。如果有人有更好的方法来替换字符串中的字符,我会很乐意听到!

以下是我迄今为止尝试过的方法:

changed_output = str(output)
这个有点奇怪。我用它来将'Sentence'项目的总数存储到一个变量中,就像这样:
number = changed_output.count('Sentence')
它有效!它返回了正确的数字'3'。但是然后我尝试使用上面我提到的Python代码(replace()函数)来替换字符,它既不替换这些字符,也不返回错误。

我还尝试过:

let = list(map(str, output))
changed_output = " ".join(let)

这也没有替换字符。但这个方法移除了列表开头和结尾的方括号,从本质上讲,它只是一系列字典。我担心第二个'Sentence'字符串上的 &lt;FONT size="5"&gt;&lt;FONT size="4"&gt; 上的引号可能导致了这个问题。有人知道如何在之后将其转换回列表吗?是否有人知道解决方案?

英文:

I'm using a MySQL database as part of a work project and I'm creating a python script for it. When extracting the data from the server (in this case, just 3 records) I use the &#39;pymysql&#39; library. This library outputs the data in a list of dictionaries. the list is the whole output, with each dictionary inside the list holding each record.

output = [{&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 9, &#39;Sentence&#39;: &#39; Tell me a joke.&#39;}, {&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 10, &#39;Sentence&#39;: &#39;&lt;FONT size=&quot;5&quot;&gt;&lt;B&gt;\nHigh┬&#225;Workloads Hit┬&#225;Service Levels&lt;/FONT&gt;&lt;/B&gt; \n&lt;BR&gt;&lt;FONT size=&quot;4&quot;&gt;&lt;FONT style=&quot;text-transform: capitalize;&quot;&gt;\ncalls and average handle time&lt;/FONT&gt; Up So Average Service 
Level Down To 92.2%.&lt;/FONT&gt;\n\n&#39;}, {&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 18, &#39;Sentence&#39;: &#39;&lt;BR&gt;&lt;BR&gt;Tell me a joke.&#39;}]

It's a bit overwhelming to look at, but you'll see 3 'Sentence' items each followed by a string. What I want to do is convert the whole 'output' list into a string so I can remove a few characters. You'll see in the string belonging to the second 'Sentence' item that there's 2 ┬&#225; characters towards the beginning of it. My thinking is to use changed_output.replace(&#39;┬&#225;&#39;, &#39; &#39;, 2) to replace the characters, but I need to convert it into a string first. If anyone has a better way of replacing characters in a string I'd love to hear!

Here's what I've tried so far:

changed_output = str(output)
This one is weird. I used it to store the total number of 'Sentence' items into a variable like so:
number = changed_output.count(&#39;Sentence&#39;)
and it worked! It returned the number '3' which is correct. However I then tried to replace the characters using the python code I put above (the replace() function) and it doesn't replace those characters, but also doesn't return an error.

I also tried:

let = list(map(str, output))
changed_output = &quot; &quot;.join(let)

Which didn't replace the characters either. However this one removed the square brackets at the beginning and end of the list, essentially making it just a series of dictionaries.
I'm worried the quotation marks on &lt;FONT size=&quot;5&quot;&gt; and &lt;FONT size=&quot;4&quot;&gt; on the second 'Sentence' string
Does anyone know how to then convert it back into a list afterward?
might be tripping all of this. Does anyone know a solution?

答案1

得分: 3

你的意图是可能的,但这是一个错误的方法,已经错了两次。

你那里的问题是编码错误 - 你的数据以cp850编码插入数据库(在某些地区仍然在Windows终端的“cmd”中使用) - 当使用“cp850”正确转换回字节并将这些字节解码为“utf-8”时,你会得到字符的代码点"\xa0",这是一个不间断的空格 - 在这种情况下是正确的字符。

正确的修复方法是在pymysql中更改连接调用,使用正确的编码而不是cp850或默认值 - 更改为utf-8。然后你可以只留下\xa0字符:它在大多数呈现文本的上下文中都会表现得合适。

如果你想做错误的事情,你可以将你的SQL数据转换为JSON字符串,替换字符,然后将其转换回来。

如果这是公司内部的代码,多年后人们可能会笑话你或咒骂你,所以要小心:

json.loads(json.dumps(output).replace("┬&#225;", "\x20"))

主要的缺点是,如果在你的数据的任何地方有其他非ASCII字符,它们将不会被修复 - 这只是替换了这一个句子。修复编码只会使你的数据正常工作。

当然,另一个缺点是在处理这些数据时增加了巨大的开销 - 例如,如果这在API端点中被调用,那可能是不可接受的。

英文:

What you intend is possible, but is the twice-removed wrong approach.

What you have there is an encoding error - your data was inserted in the database with a cp850 encoding (still used in Windows terminal "cmd" in some regions) - when properly converted back to bytes using "cp850" and decoding these bytes as "utf-8" you get the character with codepoint &quot;\xa0&quot; which is a non-breakable space - and the correct character in this case.

The correct fix would be change connection call in pymysql to use the proper encoding rather than cp850 or the default one - to utf-8. Then you can just leave the \xa0 character there: it will behave as appropriate in most contexts where text is presented.

If you want to do the wrong thing, you can convert your sql data to a json string, replace characters and deconvert it back.

If this is code inside a company, it will likely get people laughing or cursing you years from now, so be warned:

json.loads(json.dumps(output).replace(&quot;┬&#225;&quot;, &quot;\x20&quot;))

The major drawbak is that if at any point in your data there are other non-ascii characters, they won't be fixed - this just replaces this one sentence. Fixing the encoding will just make your data work.

And of course, another drawback is adding an enormous overhead on the processing of this data - if this is called in an API endpoint, for example, that could not be acceptable.

答案2

得分: 0

我会将数据保留为字典列表,因为在我看来这样更容易处理。

import datetime

output = [
    {'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 9, 'Sentence': ' 告诉我一个笑话。'},
    {'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 10, 'Sentence': '<FONT size="5"><B>\nHigh Workloads Hit Service Levels</FONT></B>\n<BR><FONT size="4"><FONT style="text-transform: capitalize;">\ncalls and average handle time</FONT> Up So Average Service Level Down To 92.2%.</FONT>\n\n'},
    {'Time_Interval': datetime.datetime(2022, 11, 14, 0, 0), 'QueueID': 1018, 'TemplateID': 18, 'LineNum': 18, 'Sentence': '<BR><BR>告诉我一个笑话。'}
]

for row in output:
    row["Sentence"] = row["Sentence"].replace(" ", " ")
    print("=================")
    print(row["Sentence"])
    print("=================")

这样你应该得到:

=================
 告诉我一个笑话。
=================
=================
<FONT size="5"><B>
High Workloads Hit Service Levels</FONT></B>
<BR><FONT size="4"><FONT style="text-transform: capitalize;">
calls and average handle time</FONT> Up So Average Service Level Down To 92.2%.</FONT>


=================
=================
<BR><BR>告诉我一个笑话。
=================
英文:

I would leave the data as a list of dictionaries as that makes it easier to work with in my opinion.

import datetime

output = [
    {&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 9, &#39;Sentence&#39;: &#39; Tell me a joke.&#39;},
    {&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 10, &#39;Sentence&#39;: &#39;&lt;FONT size=&quot;5&quot;&gt;&lt;B&gt;\nHigh┬&#225;Workloads Hit┬&#225;Service Levels&lt;/FONT&gt;&lt;/B&gt; \n&lt;BR&gt;&lt;FONT size=&quot;4&quot;&gt;&lt;FONT style=&quot;text-transform: capitalize;&quot;&gt;\ncalls and average handle time&lt;/FONT&gt; Up So Average Service Level Down To 92.2%.&lt;/FONT&gt;\n\n&#39;},
    {&#39;Time_Interval&#39;: datetime.datetime(2022, 11, 14, 0, 0), &#39;QueueID&#39;: 1018, &#39;TemplateID&#39;: 18, &#39;LineNum&#39;: 18, &#39;Sentence&#39;: &#39;&lt;BR&gt;&lt;BR&gt;Tell me a joke.&#39;}
]

for row in output:
    row[&quot;Sentence&quot;] = row[&quot;Sentence&quot;].replace(&quot;&#225;&quot;, &quot; &quot;)
    print(&quot;=================&quot;)
    print(row[&quot;Sentence&quot;])
    print(&quot;=================&quot;)

That should give you back:

=================
 Tell me a joke.
=================
=================
&lt;FONT size=&quot;5&quot;&gt;&lt;B&gt;
High Workloads Hit Service Levels&lt;/FONT&gt;&lt;/B&gt;
&lt;BR&gt;&lt;FONT size=&quot;4&quot;&gt;&lt;FONT style=&quot;text-transform: capitalize;&quot;&gt;
calls and average handle time&lt;/FONT&gt; Up So Average Service Level Down To 92.2%.&lt;/FONT&gt;


=================
=================
&lt;BR&gt;&lt;BR&gt;Tell me a joke.
=================

Where output remains a list of dictionaries.

huangapple
  • 本文由 发表于 2023年6月16日 02:32:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484570.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定