Python:如何从具有相同维度的多个数据框创建唯一的数据框

huangapple go评论76阅读模式
英文:

Python: How to create a unique dataframe from multiple dataframes with the same dimensions

问题

给定一个带有指定列名的空数据框:

colnames = ('ACCT', 'CTAT', 'AAAT', 'ATCG')*3
df = pd.DataFrame(columns=colnames)

我想要循环遍历具有以下结构的数据框(以2个示例为例):

sample_df = pd.DataFrame()
sample_df['tetran'] = colnames
sample_df['Frequency'] = (423, 512, 25, 123, 632, 124, 614, 73, 14, 75, 311, 155)
conids = ("cl1_42", "cl1_41", "cl2_31")
rep_conids = [val for val in conids for _ in range(4)]
sample_df['contig_id'] = rep_conids

sample_df_2 = pd.DataFrame()
sample_df_2['tetran'] = colnames
sample_df_2['Frequency'] = (724, 132, 4, 102, 423, 402, 616, 734, 153, 751, 31, 55)
conids_2 = ("se1_51", "se1_21", "se2_53")
rep_conids_2 = [val for val in conids_2 for _ in range(4)]
sample_df_2['contig_id'] = rep_conids_2

目标是:

  1. 将'sample_df'中的每个'Frequency'值添加到'df'的相应'tetraN'值中,并添加一个新列作为'sample_df'的['contig_id']。

有多个'sample_df'数据框,所以期望输出如下:

index ACCT CTAT AAAT ATCG
cl1_42 423 512 25 123
cl1_41 632 124 614 73
cl2_31 14 75 311 155
se1_51 724 132 4 102
se1_21 423 402 616 734
se2_53 153 751 31 55
英文:

given an empty dataframe with assigned column names :

colnames = ('ACCT', 'CTAT', 'AAAT', 'ATCG')*3
df = pd.DataFrame(columns=colnames)

I want to loop over dataframes which have the below structure: (giving 2 for demostration)

sample_df = pd.DataFrame()
sample_df['tetran'] = colnames
sample_df['Frequency'] = (423, 512, 25, 123,632,124,614,73,14,75,311,155)
conids = ("cl1_42", "cl1_41", "cl2_31")
rep_conids = [val for val in conids for _ in range(4)]
sample_df['contig_id'] = rep_conids


sample_df_2 = pd.DataFrame()
sample_df_2['tetran'] = colnames
sample_df_2['Frequency'] = (724, 132, 4, 102,423,402,616,734,153,751,31,55)
conids_2 = ("se1_51", "se1_21", "se2_53")
rep_conids_2 = [val for val in conids_2 for _ in range(4)]
sample_df_2['contig_id'] = rep_conids_2

The objective is:

  1. Add each 'Frequency' value from the 'sample_df's to the corresponding 'tetraN' value of the 'df' and add a new column to be the sample_df['contig_id']

There are multiple 'sample_df' dataframes , so this is the idea of the desired output:

index ACCT CTAT AAAT ATCG
cl1_42 423 512 25 123
cl1_41 632 124 614 73
cl2_31 14 75 311 155
se1_51 724 132 4 102
se1_21 423 402 616 734
se2_53 153 751 31 55

I know how to do this in R but I need this to be done in python so I cannot add here what I tried due it is in R.

Thanks for your time Python:如何从具有相同维度的多个数据框创建唯一的数据框

答案1

得分: 1

首先,concat你的数据框,然后 pivot 它们:

out = (pd.concat([sample_df, sample_df_2])
         .pivot(index='contig_id', columns='tetran', values='Frequency'))
print(out)

# 输出
tetran     AAAT  ACCT  ATCG  CTAT
contig_id                        
cl1_41      614   632    73   124
cl1_42       25   423   123   512
cl2_31      311    14   155    75
se1_21      616   423   734   402
se1_51        4   724   102   132
se2_53       31   153    55   751

如果你不想数据排序,可以使用 pivot_table

out = (pd.concat([sample_df, sample_df_2])
         .pivot_table(index='contig_id', columns='tetran', values='Frequency', sort=False))
print(out)

# 输出
tetran     ACCT  CTAT  AAAT  ATCG
contig_id                        
cl1_42      423   512    25   123
cl1_41      632   124   614    73
cl2_31       14    75   311   155
se1_51      724   132     4   102
se1_21      423   402   616   734
se2_53      153   751    31    55

有用的链接: 如何进行数据框的数据透视?

英文:

First, concat your dataframes then pivot them:

out = (pd.concat([sample_df, sample_df_2])
         .pivot(index='contig_id', columns='tetran', values='Frequency'))
print(out)

# Output
tetran     AAAT  ACCT  ATCG  CTAT
contig_id                        
cl1_41      614   632    73   124
cl1_42       25   423   123   512
cl2_31      311    14   155    75
se1_21      616   423   734   402
se1_51        4   724   102   132
se2_53       31   153    55   751

If you don't want the data to be sorted, use pivot_table:

out = (pd.concat([sample_df, sample_df_2])
         .pivot_table(index='contig_id', columns='tetran', values='Frequency', sort=False))
print(out)

# Output
tetran     ACCT  CTAT  AAAT  ATCG
contig_id                        
cl1_42      423   512    25   123
cl1_41      632   124   614    73
cl2_31       14    75   311   155
se1_51      724   132     4   102
se1_21      423   402   616   734
se2_53      153   751    31    55

Useful link: How can I pivot a dataframe?

huangapple
  • 本文由 发表于 2023年6月16日 04:29:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485310.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定