英文:
Pandas - Table Pivot
问题
I have translated the code portion as requested. Here is the translated code:
import pandas as pd
data = {
'GenoType': [1, 2, 4],
'Parameter1': [2, 2, 787],
'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}
df = pd.DataFrame(data)
melted_df = df.melt(id_vars=['GenoType', 'Parameter1'], var_name='Trait', value_name='Value')
melted_df[['Trait', 'Column']] = melted_df['Trait'].str.rsplit('.', n=1, expand=True)
pivoted_df = melted_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
pivoted_df.reset_index(inplace=True)
pivoted_df.columns.name = ''
# Melt the pivoted_df to obtain the final output
final_df = pivoted_df.melt(id_vars=['GenoType', 'Parameter1', 'Trait'], var_name='Column', value_name='Value')
final_df.sort_values(by=['GenoType', 'Parameter1', 'Trait', 'Column'], inplace=True)
final_df.reset_index(drop=True, inplace=True)
expanded_df = final_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
expanded_df.reset_index(inplace=True)
expanded_df.columns.name = ''
expanded_df = expanded_df.rename_axis(None, axis=1)
print(expanded_df)
Please note that this code is a translation of the provided code and doesn't include any additional information or explanations.
英文:
I currently have this table
The expected logic is i want to strip any columns that have periods in the column name, after the third period. In this case, there are 4, and after stripping the column names we will have 2 unique column names. (Cars and Food)
The additional column "trait" is just the value before the third period. and will be in the expected output.
The expected output will have more rows as it is pivoted. Also assume this case for x column names that should have periods in them.
I am not able to get it yet, but my current code is pasted below.
import pandas as pd
data = {
'GenoType': [1, 2, 4],
'Parameter1': [2, 2, 787],
'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}
df = pd.DataFrame(data)
melted_df = df.melt(id_vars=['GenoType', 'Parameter1'], var_name='Trait', value_name='Value')
melted_df[['Trait', 'Column']] = melted_df['Trait'].str.rsplit('.', n=1, expand=True)
pivoted_df = melted_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
pivoted_df.reset_index(inplace=True)
pivoted_df.columns.name = ''
# Melt the pivoted_df to obtain the final output
final_df = pivoted_df.melt(id_vars=['GenoType', 'Parameter1', 'Trait'], var_name='Column', value_name='Value')
final_df.sort_values(by=['GenoType', 'Parameter1', 'Trait', 'Column'], inplace=True)
final_df.reset_index(drop=True, inplace=True)
expanded_df = final_df.pivot_table(index=['GenoType', 'Parameter1', 'Trait'], columns='Column', values='Value', aggfunc='first')
expanded_df.reset_index(inplace=True)
expanded_df.columns.name = ''
expanded_df = expanded_df.rename_axis(None, axis=1)
print(expanded_df)
</details>
# 答案1
**得分**: 1
# 创建多级索引,以前两列为基础
df1 = df.set_index(['GenoType', 'Parameter1'])
# 过滤食品列
cols = df1.filter(regex=r'\.Food').columns
# 连接并重命名食品列名称
df1 = pd.concat([df1, df1.rename(columns=dict(zip(cols, cols[::-1])))]
# 创建列的多级索引
df1.columns = df1.columns.str.rsplit('.', n=1, expand=True)
# 重塑
df1 = df1.stack(0).sort_index(level=2).reset_index().rename(columns={'level_2':'Trait'})
print (df1)
GenoType Parameter1 Trait Car Food
0 1 2 First.Second.Third Honda Pizza
1 1 2 First.Second.Third Honda Burger
2 2 2 First.Second.Third Tesla Fries
3 2 2 First.Second.Third Tesla Potatoes
4 4 787 First.Second.Third Jeep Grapes
5 4 787 First.Second.Third Jeep Wings
6 1 2 Red.Orange.Yellow Acura Burger
7 1 2 Red.Orange.Yellow Acura Pizza
8 2 2 Red.Orange.Yellow BMW Potatoes
9 2 2 Red.Orange.Yellow BMW Fries
10 4 787 Red.Orange.Yellow Toyota Wings
11 4 787 Red.Orange.Yellow Toyota Grapes
<details>
<summary>英文:</summary>
Use:
#create MulitIndex by first 2 columns
df1 = df.set_index(['GenoType', 'Parameter1'])
#filter Food columns
cols = df1.filter(regex=r'\.Food').columns
#join with rename Food columns names
df1 = pd.concat([df1, df1.rename(columns=dict(zip(cols, cols[::-1])))])
#create MulitIndex in columns
df1.columns = df1.columns.str.rsplit('.', n=1, expand=True)
#reshape
df1 = df1.stack(0).sort_index(level=2).reset_index().rename(columns={'level_2':'Trait'})
print (df1)
GenoType Parameter1 Trait Car Food
0 1 2 First.Second.Third Honda Pizza
1 1 2 First.Second.Third Honda Burger
2 2 2 First.Second.Third Tesla Fries
3 2 2 First.Second.Third Tesla Potatoes
4 4 787 First.Second.Third Jeep Grapes
5 4 787 First.Second.Third Jeep Wings
6 1 2 Red.Orange.Yellow Acura Burger
7 1 2 Red.Orange.Yellow Acura Pizza
8 2 2 Red.Orange.Yellow BMW Potatoes
9 2 2 Red.Orange.Yellow BMW Fries
10 4 787 Red.Orange.Yellow Toyota Wings
11 4 787 Red.Orange.Yellow Toyota Grapes
</details>
# 答案2
**得分**: 1
这是代码部分,不需要翻译:
```python
data = {
'GenoType': [1, 2, 4],
'Parameter1': [2, 2, 787],
'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}
df = pd.DataFrame(data)
traits = df.columns.str.extract(r'^((?:[^.]*\.){2}[^.]*)').dropna().drop_duplicates()[0].tolist()
features = df.columns.str.extract(r'^(?:[^.]*\.){3}(.*)$').dropna().drop_duplicates()[0].tolist()
out = df.melt(value_vars=[f'{t}.{features[0]}' for t in traits], value_name=features[0], id_vars=['GenoType', 'Parameter1'], var_name='Trait')
out['Trait'] = out['Trait'].str.extract(r'^((?:[^.]*\.){2}[^.]*)')
for f in features[1:]:
dff = df.melt(value_vars=[f'{t}.{f}' for t in traits], value_name=f, id_vars=['GenoType', 'Parameter1'], var_name='Trait')
out = out.merge(dff, on=['GenoType', 'Parameter1'], suffixes=['', '_y'])
out.drop('Trait_y', axis=1, inplace=True)
out = out.sort_values('Trait', kind='mergesort').reset_index(drop=True)
希望对你有所帮助!如果有其他需要,请随时告诉我。
英文:
You could solve this using an iterative melt
and merge
. This will work for an arbitrary number of columns:
data = {
'GenoType': [1, 2, 4],
'Parameter1': [2, 2, 787],
'First.Second.Third.Car': ['Honda', 'Tesla', 'Jeep'],
'First.Second.Third.Food': ['Pizza', 'Fries', 'Grapes'],
'Red.Orange.Yellow.Car': ['Acura', 'BMW', 'Toyota'],
'Red.Orange.Yellow.Food': ['Burger', 'Potatoes', 'Wings']
}
df = pd.DataFrame(data)
traits = df.columns.str.extract(r'^((?:[^.]*\.){2}[^.]*)').dropna().drop_duplicates()[0].tolist()
features = df.columns.str.extract(r'^(?:[^.]*\.){3}(.*)$').dropna().drop_duplicates()[0].tolist()
out = df.melt(value_vars=[f'{t}.{features[0]}' for t in traits], value_name=features[0], id_vars=['GenoType', 'Parameter1'], var_name='Trait')
out['Trait'] = out['Trait'].str.extract(r'^((?:[^.]*\.){2}[^.]*)')
for f in features[1:]:
dff = df.melt(value_vars=[f'{t}.{f}' for t in traits], value_name=f, id_vars=['GenoType', 'Parameter1'], var_name='Trait')
out = out.merge(dff, on=['GenoType', 'Parameter1'], suffixes=['', '_y'])
out.drop('Trait_y', axis=1, inplace=True)
out = out.sort_values('Trait', kind='mergesort').reset_index(drop=True)
Output:
GenoType Parameter1 Trait Car Food
0 1 2 First.Second.Third Honda Pizza
1 1 2 First.Second.Third Honda Burger
2 2 2 First.Second.Third Tesla Fries
3 2 2 First.Second.Third Tesla Potatoes
4 4 787 First.Second.Third Jeep Grapes
5 4 787 First.Second.Third Jeep Wings
6 1 2 Red.Orange.Yellow Acura Pizza
7 1 2 Red.Orange.Yellow Acura Burger
8 2 2 Red.Orange.Yellow BMW Fries
9 2 2 Red.Orange.Yellow BMW Potatoes
10 4 787 Red.Orange.Yellow Toyota Grapes
11 4 787 Red.Orange.Yellow Toyota Wings
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论