Pandas – 表格透视

huangapple go评论54阅读模式
英文:

Pandas - Table Pivot

问题

I have translated the code portion as requested. Here is the translated code:

import pandas as pd

data = {
    'GenoType': [1, 2, 4],
    'Parameter1': [2, 2, 787],
    'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
    'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
    'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
    'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}

df = pd.DataFrame(data)

melted_df = df.melt(id_vars=['GenoType', 'Parameter1'], var_name='Trait', value_name='Value')
melted_df[['Trait', 'Column']] = melted_df['Trait'].str.rsplit('.', n=1, expand=True)

pivoted_df = melted_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
pivoted_df.reset_index(inplace=True)
pivoted_df.columns.name = ''

# Melt the pivoted_df to obtain the final output
final_df = pivoted_df.melt(id_vars=['GenoType', 'Parameter1', 'Trait'], var_name='Column', value_name='Value')
final_df.sort_values(by=['GenoType', 'Parameter1', 'Trait', 'Column'], inplace=True)
final_df.reset_index(drop=True, inplace=True)

expanded_df = final_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
expanded_df.reset_index(inplace=True)
expanded_df.columns.name = ''
expanded_df = expanded_df.rename_axis(None, axis=1)

print(expanded_df)

Please note that this code is a translation of the provided code and doesn't include any additional information or explanations.

英文:

I currently have this table

Pandas – 表格透视

and want to get this output:
Pandas – 表格透视

The expected logic is i want to strip any columns that have periods in the column name, after the third period. In this case, there are 4, and after stripping the column names we will have 2 unique column names. (Cars and Food)

The additional column "trait" is just the value before the third period. and will be in the expected output.

The expected output will have more rows as it is pivoted. Also assume this case for x column names that should have periods in them.

I am not able to get it yet, but my current code is pasted below.

import pandas as pd

data = {
    'GenoType': [1, 2, 4],
    'Parameter1': [2, 2, 787],
    'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
    'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
    'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
    'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}

df = pd.DataFrame(data)

melted_df = df.melt(id_vars=['GenoType', 'Parameter1'], var_name='Trait', value_name='Value')
melted_df[['Trait', 'Column']] = melted_df['Trait'].str.rsplit('.', n=1, expand=True)

pivoted_df = melted_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
pivoted_df.reset_index(inplace=True)
pivoted_df.columns.name = ''

# Melt the pivoted_df to obtain the final output
final_df = pivoted_df.melt(id_vars=['GenoType', 'Parameter1', 'Trait'], var_name='Column', value_name='Value')
final_df.sort_values(by=['GenoType', 'Parameter1', 'Trait', 'Column'], inplace=True)
final_df.reset_index(drop=True, inplace=True)

expanded_df = final_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
expanded_df.reset_index(inplace=True)
expanded_df.columns.name = ''
expanded_df = expanded_df.rename_axis(None, axis=1)

print(expanded_df)



</details>


# 答案1
**得分**: 1

# 创建多级索引,以前两列为基础
df1 = df.set_index(['GenoType', 'Parameter1'])

# 过滤食品列
cols = df1.filter(regex=r'\.Food').columns

# 连接并重命名食品列名称
df1 = pd.concat([df1, df1.rename(columns=dict(zip(cols, cols[::-1])))]

# 创建列的多级索引
df1.columns = df1.columns.str.rsplit('.', n=1, expand=True)

# 重塑
df1 = df1.stack(0).sort_index(level=2).reset_index().rename(columns={'level_2':'Trait'})
print (df1)

    GenoType  Parameter1               Trait     Car      Food
0          1           2  First.Second.Third   Honda     Pizza
1          1           2  First.Second.Third   Honda    Burger
2          2           2  First.Second.Third   Tesla     Fries
3          2           2  First.Second.Third   Tesla  Potatoes
4          4         787  First.Second.Third    Jeep    Grapes
5          4         787  First.Second.Third    Jeep     Wings
6          1           2   Red.Orange.Yellow   Acura    Burger
7          1           2   Red.Orange.Yellow   Acura     Pizza
8          2           2   Red.Orange.Yellow     BMW  Potatoes
9          2           2   Red.Orange.Yellow     BMW     Fries
10         4         787   Red.Orange.Yellow  Toyota     Wings
11         4         787   Red.Orange.Yellow  Toyota    Grapes

<details>
<summary>英文:</summary>

Use:

    #create MulitIndex by first 2 columns
    df1 = df.set_index([&#39;GenoType&#39;, &#39;Parameter1&#39;])
    
    #filter Food columns
    cols = df1.filter(regex=r&#39;\.Food&#39;).columns
    
    #join with rename Food columns names
    df1 = pd.concat([df1, df1.rename(columns=dict(zip(cols, cols[::-1])))])
    
    #create MulitIndex in columns
    df1.columns = df1.columns.str.rsplit(&#39;.&#39;, n=1, expand=True)
    
    #reshape
    df1 = df1.stack(0).sort_index(level=2).reset_index().rename(columns={&#39;level_2&#39;:&#39;Trait&#39;})
    print (df1)
    
        GenoType  Parameter1               Trait     Car      Food
    0          1           2  First.Second.Third   Honda     Pizza
    1          1           2  First.Second.Third   Honda    Burger
    2          2           2  First.Second.Third   Tesla     Fries
    3          2           2  First.Second.Third   Tesla  Potatoes
    4          4         787  First.Second.Third    Jeep    Grapes
    5          4         787  First.Second.Third    Jeep     Wings
    6          1           2   Red.Orange.Yellow   Acura    Burger
    7          1           2   Red.Orange.Yellow   Acura     Pizza
    8          2           2   Red.Orange.Yellow     BMW  Potatoes
    9          2           2   Red.Orange.Yellow     BMW     Fries
    10         4         787   Red.Orange.Yellow  Toyota     Wings
    11         4         787   Red.Orange.Yellow  Toyota    Grapes

</details>



# 答案2
**得分**: 1

这是代码部分,不需要翻译:

```python
data = {
    'GenoType': [1, 2, 4],
    'Parameter1': [2, 2, 787],
    'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
    'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
    'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
    'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}

df = pd.DataFrame(data)

traits = df.columns.str.extract(r'^((?:[^.]*\.){2}[^.]*)').dropna().drop_duplicates()[0].tolist()
features = df.columns.str.extract(r'^(?:[^.]*\.){3}(.*)$').dropna().drop_duplicates()[0].tolist()

out = df.melt(value_vars=[f'{t}.{features[0]}' for t in traits], value_name=features[0], id_vars=['GenoType', 'Parameter1'], var_name='Trait')
out['Trait'] = out['Trait'].str.extract(r'^((?:[^.]*\.){2}[^.]*)')
for f in features[1:]:
    dff = df.melt(value_vars=[f'{t}.{f}' for t in traits], value_name=f, id_vars=['GenoType', 'Parameter1'], var_name='Trait')
    out = out.merge(dff, on=['GenoType', 'Parameter1'], suffixes=['', '_y'])
    out.drop('Trait_y', axis=1, inplace=True)
out = out.sort_values('Trait', kind='mergesort').reset_index(drop=True)

希望对你有所帮助!如果有其他需要,请随时告诉我。

英文:

You could solve this using an iterative melt and merge. This will work for an arbitrary number of columns:

data = {
    &#39;GenoType&#39;: [1, 2, 4],
    &#39;Parameter1&#39;: [2, 2, 787],
    &#39;First.Second.Third.Car&#39;: [&#39;Honda&#39;, &#39;Tesla&#39;, &#39;Jeep&#39;],
    &#39;First.Second.Third.Food&#39;: [&#39;Pizza&#39;, &#39;Fries&#39;, &#39;Grapes&#39;],
    &#39;Red.Orange.Yellow.Car&#39;: [&#39;Acura&#39;, &#39;BMW&#39;, &#39;Toyota&#39;],
    &#39;Red.Orange.Yellow.Food&#39;: [&#39;Burger&#39;, &#39;Potatoes&#39;, &#39;Wings&#39;]
}

df = pd.DataFrame(data)

traits = df.columns.str.extract(r&#39;^((?:[^.]*\.){2}[^.]*)&#39;).dropna().drop_duplicates()[0].tolist()
features = df.columns.str.extract(r&#39;^(?:[^.]*\.){3}(.*)$&#39;).dropna().drop_duplicates()[0].tolist()

out = df.melt(value_vars=[f&#39;{t}.{features[0]}&#39; for t in traits], value_name=features[0], id_vars=[&#39;GenoType&#39;, &#39;Parameter1&#39;], var_name=&#39;Trait&#39;)
out[&#39;Trait&#39;] = out[&#39;Trait&#39;].str.extract(r&#39;^((?:[^.]*\.){2}[^.]*)&#39;)
for f in features[1:]:
    dff = df.melt(value_vars=[f&#39;{t}.{f}&#39; for t in traits], value_name=f, id_vars=[&#39;GenoType&#39;, &#39;Parameter1&#39;], var_name=&#39;Trait&#39;)
    out = out.merge(dff, on=[&#39;GenoType&#39;, &#39;Parameter1&#39;], suffixes=[&#39;&#39;, &#39;_y&#39;])
    out.drop(&#39;Trait_y&#39;, axis=1, inplace=True)
out = out.sort_values(&#39;Trait&#39;, kind=&#39;mergesort&#39;).reset_index(drop=True)

Output:

    GenoType  Parameter1               Trait     Car      Food
0          1           2  First.Second.Third   Honda     Pizza
1          1           2  First.Second.Third   Honda    Burger
2          2           2  First.Second.Third   Tesla     Fries
3          2           2  First.Second.Third   Tesla  Potatoes
4          4         787  First.Second.Third    Jeep    Grapes
5          4         787  First.Second.Third    Jeep     Wings
6          1           2   Red.Orange.Yellow   Acura     Pizza
7          1           2   Red.Orange.Yellow   Acura    Burger
8          2           2   Red.Orange.Yellow     BMW     Fries
9          2           2   Red.Orange.Yellow     BMW  Potatoes
10         4         787   Red.Orange.Yellow  Toyota    Grapes
11         4         787   Red.Orange.Yellow  Toyota     Wings

huangapple
  • 本文由 发表于 2023年4月17日 12:33:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031736.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定