英文:
How to add level 0 columns in multi-index data frame and keep format
问题
我有一个具有多级索引的大型数据框,有许多行和列。我想在数据框的第一级添加一列,同时保持第二级的结构。我的第一个目标是了解如何向这个数据框添加一个新的空列。我的第二个目标是了解如何用一个计算来填充这个新列,例如第一列字段和第二列字段之间的简单减法运算。
原始数据框:
data = [[99,3,12,4,63,55]]
cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])
df = pd.DataFrame(data, columns = cols)
df.columns.set_levels(['1. FY21','2. FY22','3. FY23'],level=0,inplace=True)
df.columns.set_levels(['Values', 'Sites'], level=1, inplace=True)
期望输出 #1 - 添加一个空的新列:
data_new = [[99,3,12,4,63,55,"blank","blank"]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_new = pd.DataFrame(data_new, columns = cols_new)
期望输出 #2 - 在新列中进行计算:
data_newer = [[99,3,12,4,63,55,87,-1]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_newer = pd.DataFrame(data_newer, columns = cols_new)
英文:
I have a large multi-indexed data frame with many rows and columns. I want to add a column to the data frame at level one, while keeping the level two structure. My first goal is to understand how to add a new blank column to this data frame. My second goal is to understand how to populate that new column with a calculation, in this case a simple subtraction between the first column fields and second column fields.
Original Data frame:
data = [[99,3,12,4,63,55]]
cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])
df = pd.DataFrame(data, columns = cols)
df.columns.set_levels(['1. FY21','2. FY22','3. FY23'],level=0,inplace=True)
df.columns.set_levels(['Values', 'Sites'], level=1, inplace=True)
Desired Output #1 - Adding just an empty new column
data_new = [[99,3,12,4,63,55,"blank","blank"]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_new = pd.DataFrame(data_new, columns = cols_new)
Desired Output #2 - Doing a calculation in new column
data_newer = [[99,3,12,4,63,55,87,-1]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_newer = pd.DataFrame(data_newer, columns = cols_new)
答案1
得分: 1
你可以从第一级获取唯一值,并将其与原始DataFrame进行join
操作,以创建具有相同结构的新列。然后使用at
方法分配值:
df = df.join(pd.DataFrame(columns=pd.MultiIndex.from_product([["4. New Column"], df.columns.get_level_values(1).unique()])))
df.at[0, ("4. New Column","Values")] = 87
df.at[0, ("4. New Column","Sites")] = -1
>>> df
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 -1 87
英文:
You could get the unique values from level 1 and join
to your original DataFrame to create the new columns with the same structure. Then use at
to assign values:
df = df.join(pd.DataFrame(columns=pd.MultiIndex.from_product([["4. New Column"], df.columns.get_level_values(1).unique()])))
df.at[0, ("4. New Column","Values")] = 87
df.at[0, ("4. New Column","Sites")] = -1
>>> df
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 -1 87
答案2
得分: 1
你可以这样做:
# 获取第一层级的列名
cols = df.columns.get_level_values(0).unique()
# 获取第一列
first_column = df.xs(cols[0], level=0, axis=1)
# 获取第二列
second_column = df.xs(cols[1], level=0, axis=1)
# 进行计算
d = first_column - second_column
# 添加列级别
d = pd.concat({"4. New Column": d}, axis=1)
print(d)
输出结果为:
4. New Column
Sites Values
0 87 -1
然后将该列添加到原始数据框中:
df = pd.concat([df, d], axis=1)
print(df)
输出结果为:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 87 -1
编辑:向原始数据框添加空白列:
# 创建一个空白数据框
blank = pd.DataFrame(
{("4. New Column", "Sites"): [np.nan], ("4. New Column", "Values"): [np.nan]}
)
# 将空白数据框与原始数据框连接起来
df = pd.concat([df, blank], axis=1)
print(df)
输出结果为:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 NaN NaN
或者:创建新的列:
df.loc[:, ("4. New Column", "Sites")] = [np.nan] * len(df)
df.loc[:, ("4. New Column", "Values")] = [np.nan] * len(df)
英文:
You can do:
# get unique names for columns at level 0
cols = df.columns.get_level_values(0).unique()
# get first column
first_column = df.xs(cols[0], level=0, axis=1)
# get second column
second_column = df.xs(cols[1], level=0, axis=1)
# do the computation
d = first_column - second_column
# add column level
d = pd.concat({"4. New Column": d}, axis=1)
print(d)
Prints:
4. New Column
Sites Values
0 87 -1
Then add the column to the original dataframe:
df = pd.concat([df, d], axis=1)
print(df)
Prints:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 87 -1
EDIT: To add blank column to original dataframe:
# create a blank dataframe
blank = pd.DataFrame(
{("4. New Column", "Sites"): [np.nan], ("4. New Column", "Values"): [np.nan]}
)
# concatenate the blank dataframe with original df
df = pd.concat([df, blank], axis=1)
print(df)
Prints:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 NaN NaN
Or: create new columns:
df.loc[:, ("4. New Column", "Sites")] = [np.nan] * len(df)
df.loc[:, ("4. New Column", "Values")] = [np.nan] * len(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论