使用重复的列名重新塑造数据框

huangapple go评论104阅读模式
英文:

Reshaping a Dataframe with repeating column names

问题

我有这样的数据:

  1. dataframe_1:
  2. week SITE LAL SITE LAL
  3. 0 1 BARTON CHAPEL 1.1 PENASCAL I 1
  4. 1 2 BARTON CHAPEL 1.1 PENASCAL I 1
  5. 2 3 BARTON CHAPEL 1.1 PENASCAL I 1

我需要最终的数据框看起来像这样:

  1. dataframe_2:
  2. week SITE LAL
  3. 0 1 BARTON CHAPEL 1.1
  4. 1 2 BARTON CHAPEL 1.1
  5. 2 3 BARTON CHAPEL 1.1
  6. 3 1 PENASCAL I 1
  7. 4 2 PENASCAL I 1
  8. 5 3 PENASCAL I 1

我尝试使用'melt',但我无法得到期望的结果。也许我在使用错误的方法?
谢谢。

英文:

I have data that looks like this:

  1. dataframe_1:
  2. week SITE LAL SITE LAL
  3. 0 1 BARTON CHAPEL 1.1 PENASCAL I 1
  4. 1 2 BARTON CHAPEL 1.1 PENASCAL I 1
  5. 2 3 BARTON CHAPEL 1.1 PENASCAL I 1

And, i need the final dataframe to look like this:

  1. dataframe_2:
  2. week SITE LAL
  3. 0 1 BARTON CHAPEL 1.1
  4. 1 2 BARTON CHAPEL 1.1
  5. 2 3 BARTON CHAPEL 1.1
  6. 3 1 PENASCAL I 1
  7. 4 2 PENASCAL I 1
  8. 5 3 PENASCAL I 1

I've tried using 'melt' but I cannot get the desire result. Perhaps I'm using the wrong approach?
thank you,

答案1

得分: 3

如果你有明确的列名:

  1. week SITE LAL SITE.1 LAL.1
  2. 0 1 BARTON CHAPEL 1.1 PENASCAL I 1
  3. 1 2 BARTON CHAPEL 1.1 PENASCAL I 1
  4. 2 3 BARTON CHAPEL 1.1 PENASCAL I 1

你可以使用 pandas.lreshape 函数:

  1. pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

输出结果:

  1. week SITE LAL
  2. 0 1 BARTON CHAPEL 1.1
  3. 1 2 BARTON CHAPEL 1.1
  4. 2 3 BARTON CHAPEL 1.1
  5. 3 1 PENASCAL I 1.0
  6. 4 2 PENASCAL I 1.0
  7. 5 3 PENASCAL I 1.0

对于你的重复列,你可以使用 melt + pivot

  1. out = (df
  2. .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
  3. .pivot(index=['idx', 'week'], columns='variable', values='value')
  4. .reset_index('week').rename_axis(index=None, columns=None)
  5. )

输出结果:

  1. week LAL SITE
  2. 0 1 1.1 BARTON CHAPEL
  3. 0 2 1.1 BARTON CHAPEL
  4. 0 3 1.1 BARTON CHAPEL
  5. 1 1 1 PENASCAL I
  6. 1 2 1 PENASCAL I
  7. 1 3 1 PENASCAL I
英文:

If you had unambiguous column names:

  1. week SITE LAL SITE.1 LAL.1
  2. 0 1 BARTON CHAPEL 1.1 PENASCAL I 1
  3. 1 2 BARTON CHAPEL 1.1 PENASCAL I 1
  4. 2 3 BARTON CHAPEL 1.1 PENASCAL I 1

You could use pandas.lreshape:

  1. pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

Output:

  1. week SITE LAL
  2. 0 1 BARTON CHAPEL 1.1
  3. 1 2 BARTON CHAPEL 1.1
  4. 2 3 BARTON CHAPEL 1.1
  5. 3 1 PENASCAL I 1.0
  6. 4 2 PENASCAL I 1.0
  7. 5 3 PENASCAL I 1.0

With your duplicated columns, you can melt + pivot:

  1. out = (df
  2. .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
  3. .pivot(index=['idx', 'week'], columns='variable', values='value')
  4. .reset_index('week').rename_axis(index=None, columns=None)
  5. )

Output:

  1. week LAL SITE
  2. 0 1 1.1 BARTON CHAPEL
  3. 0 2 1.1 BARTON CHAPEL
  4. 0 3 1.1 BARTON CHAPEL
  5. 1 1 1 PENASCAL I
  6. 1 2 1 PENASCAL I
  7. 1 3 1 PENASCAL I

答案2

得分: 1

这不是一个非常通用的解决方案,但将适用于您的示例:

  1. df.groupby('week').apply(lambda _df: pd.concat((_df.iloc[:, 1:3], _df.iloc[:, 3:5]))).reset_index('week')

它按周分组,然后通过列选择和连接来重新塑造数据。最后移除了一个多余的索引列。

英文:

Not a very generalizable solution, but will work on your example:

  1. df.groupby('week').apply( lambda _df : pd.concat((_df.iloc[:,1:3], _df.iloc[:,3:5]))).reset_index('week')

it groups by week and then reshapes with column selection + concatenation. Removing a superfluous index column in the end.

答案3

得分: 0

你可以通过从 dataframe_1 中提取你想要的列并在底部连接它们来实现这个目标:

  1. # 创建一个包含 6 列和 10 行的虚拟数据框
  2. df = pd.DataFrame(np.random.randint(0, 100, size=(10, 6)), columns=list('ABCDEF'))
  3. df_tempo = df[['D', 'E', 'F']]

然后将临时数据框的列重命名为你的情况下相同的名称 ['week', 'SITE', 'LAL']:

  1. df_tempo.columns = ['A', 'B', 'C']

然后将临时数据框与你想要保留的第一个数据框的子集连接起来:

  1. df = pd.concat([df[['A', 'B', 'C']], df_tempo], axis=0, ignore_index=True)

希望对你有所帮助!

英文:

You can do this by extracting from the dataframe_1 the columns you want to extract and concatenate at the bottom:

  1. # create a dummy dataframe with 6 columns and 10 rows
  2. df = pd.DataFrame(np.random.randint(0,100,size=(10, 6)), columns=list('ABCDEF'))
  3. df_tempo = df[['D','E','F']]

Then renaming the columns of the temporary dataframe with the same name in your case ['week', 'SITE', 'LAL']:

  1. df_tempo.columns = ['A','B','C']

And then concatenate the temporary with the subset of the first dataframe you want to keep:

  1. df = pd.concat([df[['A','B','C']], df_tempo], axis=0, ignore_index=True)

Hope it helps!

huangapple
  • 本文由 发表于 2023年6月1日 22:42:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定