英文:
Pandas Display Strings as "Literals" in DataFrame Outputs
问题
编辑:
我应该明确,我希望富HTML输出显示转义字符。将系列数据显示在正确的形式中是很好的,@Stu Sztukowski 提供了一个有用的方法来实现这一点。
df['type:string'].apply(lambda x: repr(x))
然而,我真正想看到的是这样的东西:
问题
我想知道是否有一种方法可以强制pandas在数据类型为string
时显式显示字符串中的转义字符。目前,在显示一个Series时,如果数据类型是object
,那么转义的空白字符会以显式方式显示出来。但是,当同一系列数据转换为string
时,它会显示为普通的空白字符(或打印出来)。
可重现的示例:
以下是一个可以用来重现效果的代码示例。
import pandas as pd
string_data = ['foo\nbar', 'foo\tbar', '\nfoo\n\tbarbar', 'bar\rbaz', None]
data = {"type:object":string_data, "type:string":string_data}
df = pd.DataFrame(data)
df = df.astype({'type:string':'string'})
当查看两个系列数据时,我看到的输出如下:
>>> df['type:object']
0 foo\nbar
1 foo\tbar
2 \nfoo\n\tbarbar
3 bar\rbaz
4 None
Name: type:object, dtype: object
>>> df['type:string']
0 foo
bar
1 foo bar
2
foo
barbar
3 bar
baz
4 <NA>
Name: type:string, dtype: string
我发现现代pandas>1.0的string
数据类型及其对pd.NA
的使用非常有帮助,所以我更喜欢使用此示例中的type:string
列。然而,这使得以智能方式查看其他数据的字符串内容变得困难,因为字符串的真实内容对我而言是不清晰的。
当查看.ipynb文件输出的DataFrame时,问题变得更加严重,例如在VSCode中的此视图:
如您所见,所有的转义空白字符都显示为相同的普通空白字符。
结论
是否有人知道一种方法,可以在保持数据类型为string
的同时,强制pandas像在type:object
列中那样显示字符串?
我查看了pandas文档认为可能可以通过pd.set_option()
来修复,但我没有在那里找到任何信息。
英文:
EDIT:
I should specify that I want the rich html output to show the escaped characters. It is nice to have the series in the correct form, and @Stu Sztukowski provided a helpful way to do that.
df['type:string'].apply(lambda x: repr(x))
However, what I really want to see is something like this:
Problem
I am wondering if there is a way to force pandas to display strings with their escape characters explicitly shown when the datatype is string
. Currently, when displaying a Series, if the datatype is object
then the escaped whitespace characters are shown explicitly. However, when that same series is converted to string
, then it shows them as generic whitespace (or prints them).
Reproducible example:
Here is some code you can use to reproduce the effect.
import pandas as pd
string_data = ['foo\nbar', 'foo\tbar', '\nfoo\n\tbarbar', 'bar\rbaz', None]
data = {"type:object":string_data, "type:string":string_data}
df = pd.DataFrame(data)
df = df.astype({'type:string':'string'})
Here are the outputs I see when viewing both series on their own:
>>> df['type:object']
0 foo\nbar
1 foo\tbar
2 \nfoo\n\tbarbar
3 bar\rbaz
4 None
Name: type:object, dtype: object
>>> df['type:string']
0 foo
bar
1 foo bar
2
foo
barbar
3 bar
baz
4 <NA>
Name: type:string, dtype: string
I find the utility of the modern pandas>1.0 string
datatype and its usage of pd.NA
to be very helpful, so I would prefer to use the type:string
column from this example. However, this makes it hard to view the strings in an intelligent way with other data, because the true contents of the string are obscured from me.
The problem only gets worse when looking at the DataFrame in .ipynb file outputs such as this view from VSCode:
As you can see, all of the escaped whitespace characters are shown as the same generic whitespace.
Conclusion
Does anyone know of a way to force pandas to display the strings as it does in the type:object
column while keeping the datatype as string
?
I checked through this pandas documentation thinking that it might be something fixable via pd.set_option()
but I didn't see anything there.
答案1
得分: 1
将 repr()
作为 lambda 函数应用。
df['type:string'].apply(lambda x: repr(x))
0 'foo\nbar'
1 'foo\tbar'
2 '\nfoo\n\tbarbar'
3 'bar\rbaz'
4 <NA>
英文:
Apply repr()
as a lambda function.
df['type:string'].apply(lambda x: repr(x))
0 'foo\nbar'
1 'foo\tbar'
2 '\nfoo\n\tbarbar'
3 'bar\rbaz'
4 <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论