2023年6月8日 03:19:38go评论91阅读模式

英文:

Pandas Display Strings as "Literals" in DataFrame Outputs

问题

编辑：

我应该明确，我希望富HTML输出显示转义字符。将系列数据显示在正确的形式中是很好的，@Stu Sztukowski 提供了一个有用的方法来实现这一点。

df[&#39;type:string&#39;].apply(lambda x: repr(x))

然而，我真正想看到的是这样的东西：

问题

我想知道是否有一种方法可以强制pandas在数据类型为string时显式显示字符串中的转义字符。目前，在显示一个Series时，如果数据类型是object，那么转义的空白字符会以显式方式显示出来。但是，当同一系列数据转换为string时，它会显示为普通的空白字符（或打印出来）。

可重现的示例：

以下是一个可以用来重现效果的代码示例。

import pandas as pd
string_data = [&#39;foo\nbar&#39;, &#39;foo\tbar&#39;, &#39;\nfoo\n\tbarbar&#39;, &#39;bar\rbaz&#39;, None]
data = {&quot;type:object&quot;:string_data, &quot;type:string&quot;:string_data}
df = pd.DataFrame(data)
df = df.astype({&#39;type:string&#39;:&#39;string&#39;})

当查看两个系列数据时，我看到的输出如下：

&gt;&gt;&gt; df[&#39;type:object&#39;]
0           foo\nbar
1           foo\tbar
2    \nfoo\n\tbarbar
3           bar\rbaz
4               None
Name: type:object, dtype: object

&gt;&gt;&gt; df[&#39;type:string&#39;]
0         foo
bar
1         foo	bar
2    
foo
	barbar
3         bar
baz
4            &lt;NA&gt;
Name: type:string, dtype: string

我发现现代pandas>1.0的string数据类型及其对pd.NA的使用非常有帮助，所以我更喜欢使用此示例中的type:string列。然而，这使得以智能方式查看其他数据的字符串内容变得困难，因为字符串的真实内容对我而言是不清晰的。

当查看.ipynb文件输出的DataFrame时，问题变得更加严重，例如在VSCode中的此视图：

如您所见，所有的转义空白字符都显示为相同的普通空白字符。

结论

是否有人知道一种方法，可以在保持数据类型为string的同时，强制pandas像在type:object列中那样显示字符串？

我查看了pandas文档认为可能可以通过pd.set_option()来修复，但我没有在那里找到任何信息。

英文:

EDIT:

I should specify that I want the rich html output to show the escaped characters. It is nice to have the series in the correct form, and @Stu Sztukowski provided a helpful way to do that.

df[&#39;type:string&#39;].apply(lambda x: repr(x))

However, what I really want to see is something like this:

Problem

I am wondering if there is a way to force pandas to display strings with their escape characters explicitly shown when the datatype is string. Currently, when displaying a Series, if the datatype is object then the escaped whitespace characters are shown explicitly. However, when that same series is converted to string, then it shows them as generic whitespace (or prints them).

Reproducible example:

Here is some code you can use to reproduce the effect.

import pandas as pd
string_data = [&#39;foo\nbar&#39;, &#39;foo\tbar&#39;, &#39;\nfoo\n\tbarbar&#39;, &#39;bar\rbaz&#39;, None]
data = {&quot;type:object&quot;:string_data, &quot;type:string&quot;:string_data}
df = pd.DataFrame(data)
df = df.astype({&#39;type:string&#39;:&#39;string&#39;})

Here are the outputs I see when viewing both series on their own:

&gt;&gt;&gt; df[&#39;type:object&#39;]
0           foo\nbar
1           foo\tbar
2    \nfoo\n\tbarbar
3           bar\rbaz
4               None
Name: type:object, dtype: object

&gt;&gt;&gt; df[&#39;type:string&#39;]
0         foo
bar
1         foo	bar
2    
foo
	barbar
3         bar
baz
4            &lt;NA&gt;
Name: type:string, dtype: string

I find the utility of the modern pandas>1.0 string datatype and its usage of pd.NA to be very helpful, so I would prefer to use the type:string column from this example. However, this makes it hard to view the strings in an intelligent way with other data, because the true contents of the string are obscured from me.

The problem only gets worse when looking at the DataFrame in .ipynb file outputs such as this view from VSCode:

As you can see, all of the escaped whitespace characters are shown as the same generic whitespace.

Conclusion

Does anyone know of a way to force pandas to display the strings as it does in the type:object column while keeping the datatype as string?

I checked through this pandas documentation thinking that it might be something fixable via pd.set_option() but I didn't see anything there.

答案1

得分: 1

将 repr() 作为 lambda 函数应用。

df['type:string'].apply(lambda x: repr(x))
0           'foo\nbar'
1           'foo\tbar'
2    '\nfoo\n\tbarbar'
3           'bar\rbaz'
4                 <NA>

英文:

Apply repr() as a lambda function.

df[&#39;type:string&#39;].apply(lambda x: repr(x))

0           &#39;foo\nbar&#39;
1           &#39;foo\tbar&#39;
2    &#39;\nfoo\n\tbarbar&#39;
3           &#39;bar\rbaz&#39;
4                 &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 在 DataFrame 输出中将字符串显示为 “文字”

问题

编辑：

问题

可重现的示例：

结论

EDIT:

Problem

Reproducible example:

Conclusion

答案1

从BeautifulSoup4的结果创建数据框由于结构问题无法工作。

在Pandas中，按另一列对数据进行分组，计算行之间的百分比变化。

如何在保持所有条形之间均匀间距的情况下更改条形的宽度

使用BS4和请求无法永久获取网站数据 – 现在需要另一种方法

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。