使用默认值并明确指定相同的默认值会在pd.wide_to_long中产生不同的结果。

huangapple go评论118阅读模式
英文:

Using the default and specifying the same default value explicitly give different results in pd.wide_to_long

问题

我正在将DataFrame从宽格式转换为长格式。但是,在两种情况下,我得到了不同的结果,这两种情况应该是相同的 - 请参见下面的代码:

import pandas as pd
import numpy as np

d_test = pd.DataFrame({"id": [1,2,3,5], "q1": [1,4,4,2], "q2": [4,np.nan,9,0]}, index=["a","b","c","d"])

# 返回一个空的DataFrame作为结果
pd.wide_to_long(d_test, stubnames=["q"], suffix=r"\d+", i="id", j="time")

# 这个可以正常工作:
pd.wide_to_long(d_test, stubnames=["q"], i="id", j="time")

请注意,这两行代码是相同的:在文档中,您可以看到suffix参数的默认值与我明确指定的值相同。

有人可以帮助我理解这里发生了什么问题吗?

英文:

I was reshaping a DataFrame from wide to long format. However I get different results in two cases which should be identical - see below

import pandas as pd
import numpy as np

d_test = pd.DataFrame({"id": [1,2,3,5], "q1": [1,4,4,2], "q2": [4,np.nan,9,0]}, index=["a","b","c","d"])

# Gives an empty DataFrame as result
pd.wide_to_long(d_test,stubnames=["q"], suffix=r"\\d+",  i="id", j="time") 

# This works:
pd.wide_to_long(d_test,stubnames=["q"],  i="id", j="time") 

Note that both lines are the same: In the documentation you can see that the default value for the suffix argument is identical to the one I specified explicitly.

Can someone help me in understanding what went wrong here?

答案1

得分: 2

你错误地转义了你的正则表达式:r"\\d+" 表示一个字面上的 \ 后跟一个或多个 d

请注意,wide_to_long 文档使用的是 '\\d+',而不是 r'\\d+'

pd.wide_to_long(d_test, stubnames=['q'], suffix=r'\d+', i='id', j='time')

# 或者
pd.wide_to_long(d_test, stubnames=['q'], suffix='\\d+', i='id', j='time')

输出:

           q
id time     
1  1     1.0
2  1     4.0
3  1     4.0
5  1     2.0
1  2     4.0
2  2     NaN
3  2     9.0
5  2     0.0
英文:

You incorrectly escape your regex: r"\\d+" means a literal \ followed by one or many d.

Note that the wide_to_long documentation uses '\\d+', not r'\\d+'.

pd.wide_to_long(d_test, stubnames=['q'], suffix=r'\d+', i='id', j='time')

# or
pd.wide_to_long(d_test, stubnames=['q'], suffix='\\d+', i='id', j='time')

Output:

           q
id time     
1  1     1.0
2  1     4.0
3  1     4.0
5  1     2.0
1  2     4.0
2  2     NaN
3  2     9.0
5  2     0.0

huangapple
  • 本文由 发表于 2023年6月16日 16:10:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76488193.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定