2023年2月16日 03:24:11go评论99阅读模式

英文:

Column Pair-wise aggregation and reorganization in Pandas

问题

我正在将一个csv文件导入到一个pandas dataframe中，如下所示：

df = pd.DataFrame( {0: {0: 'ID', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}, 1: {0: '净成本', 1: '30', 2: '40', 3: '50', 4: '35', 5: '45'}, 2: {0: '费用描述', 1: '附加费 A', 2: '折扣 X', 3: '折扣 X', 4: '折扣 X', 5: '附加费 A'}, 3: {0: '费用金额', 1: '9.5', 2: '-12.5', 3: '-11.5', 4: '-5.5', 5: '9.5'}, 4: {0: '费用描述', 1: '折扣 X', 2: '', 3: '', 4: '附加费 B', 5: '折扣 X'}, 5: {0: '费用金额', 1: '-11.5', 2: '', 3: '', 4: '3.5', 5: '-10.5'}, 6: {0: '费用描述', 1: '折扣 Y', 2: '', 3: '', 4: '', 5: '附加费 B'}, 7: {0: '费用金额', 1: '-3.25', 2: '', 3: '', 4: '', 5: '4.5'}, 8: {0: '费用描述', 1: '附加费 B', 2: '', 3: '', 4: '', 5: ''}, 9: {0: '费用金额', 1: '2.5', 2: '', 3: '', 4: '', 5: ''}} )

第一行是包含列名的标题，其中“费用描述”和“费用金额”成对出现多次。

期望的输出是一个df，每个描述都有一个唯一的列，重新组织的列按字母顺序排序，并且NaN显示为0：

|   ID |   净成本 |   附加费 A |   附加费 B |   折扣 X |   折扣 Y |
|-----:|-------:|----------:|----------:|-------:|-------:|
|    1 |     30 |       9.5 |       2.5 |  -11.5 |  -3.25 |
|    2 |     40 |         0 |         0 |  -12.5 |      0 |
|    3 |     50 |         0 |         0 |  -11.5 |      0 |
|    4 |     35 |         0 |       3.5 |   -5.5 |      0 |
|    5 |     45 |       9.5 |       4.5 |  -10.5 |      0 |

英文:

I am importing a csv file into a pandas dataframe such as:

df = pd.DataFrame( {0: {0: &#39;ID&#39;, 1: &#39;1&#39;, 2: &#39;2&#39;, 3: &#39;3&#39;, 4: &#39;4&#39;, 5: &#39;5&#39;}, 1: {0: &#39;Net Cost&#39;, 1: &#39;30&#39;, 2: &#39;40&#39;, 3: &#39;50&#39;, 4: &#39;35&#39;, 5: &#39;45&#39;}, 2: {0: &#39;Charge Description&#39;, 1: &#39;Surcharge A&#39;, 2: &#39;Discount X&#39;, 3: &#39;Discount X&#39;, 4: &#39;Discount X&#39;, 5: &#39;Surcharge A&#39;}, 3: {0: &#39;Charge Amount&#39;, 1: &#39;9.5&#39;, 2: &#39;-12.5&#39;, 3: &#39;-11.5&#39;, 4: &#39;-5.5&#39;, 5: &#39;9.5&#39;}, 4: {0: &#39;Charge Description&#39;, 1: &#39;Discount X&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;Surcharge B&#39;, 5: &#39;Discount X&#39;}, 5: {0: &#39;Charge Amount&#39;, 1: &#39;-11.5&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;3.5&#39;, 5: &#39;-10.5&#39;}, 6: {0: &#39;Charge Description&#39;, 1: &#39;Discount Y&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;Surcharge B&#39;}, 7: {0: &#39;Charge Amount&#39;, 1: &#39;-3.25&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;4.5&#39;}, 8: {0: &#39;Charge Description&#39;, 1: &#39;Surcharge B&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;&#39;}, 9: {0: &#39;Charge Amount&#39;, 1: &#39;2.5&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;&#39;}} )

0	1	2	3	4	5	6	7	8	9
ID	Net Cost	Charge Description	Charge Amount	Charge Description	Charge Amount	Charge Description	Charge Amount	Charge Description	Charge Amount
1	30	Surcharge A	9.5	Discount X	-11.5	Discount Y	-3.25	Surcharge B	2.5
2	40	Discount X	-12.5
3	50	Discount X	-11.5
4	35	Discount X	-5.5	Surcharge B	3.5
5	45	Surcharge A	9.5	Discount X	-10.5	Surcharge B	4.5

The first row are the headers with column names Charge Description and Charge Amount forming pairs and appearing multiple times.

Desired output is a df with a unique column for each description, with the reorganized columns sorted alphabetically and NaNs showing as 0:

ID	Net Cost	Surcharge A	Surcharge B	Discount X	Discount Y
1	30	9.5	2.5	-11.5	-3.25
2	40	0	0	-12.5	0
3	50	0	0	-11.5	0
4	35	0	3.5	-5.5	0
5	45	9.5	4.5	-10.5	0

This post looks like a good starting point but then I need a column for each Charge Description and only a single row per ID.

答案1

得分: 2

I used the file you shared, and edited the columns with the initial dataframe df shared (Pandas automatically adds suffixes to columns to make them unique) to keep the non uniqueness:

invoice = pd.read_csv('Downloads/Example Invoice.csv')
invoice.columns = ['ID', 'Net Cost', 'Charge Description', 'Charge Amount',
                   'Charge Description', 'Charge Amount', 
                   'Charge Description', 'Charge Amount', 
                   'Charge Description', 'Charge Amount']
print(invoice)
   ID  Net Cost Charge Description  Charge Amount  ... Charge Description  Charge Amount Charge Description  Charge Amount
0   1        30        Surcharge A            9.5  ...         Discount Y          -3.25        Surcharge B            2.5
1   2        40         Discount X          -12.5  ...                NaN            NaN                NaN            NaN
2   3        50         Discount X          -11.5  ...                NaN            NaN                NaN            NaN
3   4        35         Discount X           -5.5  ...                NaN            NaN                NaN            NaN
4   5        45        Surcharge A            9.5  ...        Surcharge B           4.50                NaN            NaN

First step is to transform to long form with pivot_longer from pyjanitor - in this case we take advantage of the fact that charge description is followed by charge amount - we can safely pair them and reshape into two columns. After that is done, we flip back to wide form - getting Surcharge and Discount values as headers. Thankfully, the index is unique, so a pivot works without extras. I used pivot_wider here, primarily for convenience - the same can be achieved with pivot, with just a few cleanup steps - under the hood pivot_wider uses pd.pivot.

# pip install pyjanitor
import pandas as pd
import janitor
index = ['ID', 'Net Cost']
arr = ['Charge Description', 'Charge Amount']
(invoice
.pivot_longer(
    index = index, 
    names_to = arr, 
    names_pattern = arr, 
    dropna=True)
.pivot_wider(
    index=index,
    names_from='Charge Description', 
    values_from='Charge Amount')
.fillna(0)
)
   ID  Net Cost  Discount X  Discount Y  Surcharge A  Surcharge B
0   1        30       -11.5       -3.25          9.5          2.5
1   2        40       -12.5        0.00          0.0          0.0
2   3        50       -11.5        0.00          0.0          0.0
3   4        35        -5.5        0.00          0.0          3.5
4   5        45       -10.5        0.00          9.5          4.5

Another option - since the data is fairly consistent with the ordering, you can dump down into numpy, reshape into a two column array, keep track of the ID and Net Cost columns (ensure they are correctly paired), and then pivot to get your final data:

index = ['ID', 'Net Cost']
arr = ['Charge Description', 'Charge Amount']
invoice = invoice.set_index(index)
out = invoice.to_numpy().reshape(-1, 2)
out = pd.DataFrame(out, columns = arr)
# reshape above is in order `C` - default
# so we can safely repeat the index
# with a value of 4
# which is what you get -&gt; 
# invoice.columns.size // 2
# to correctly pair the index with the new dataframe
out.index = invoice.index.repeat(invoice.columns.size//2)
# get rid of nulls, and flip to wide form
(out
.dropna(how='all')
.set_index('Charge Description', append=True)
.squeeze()
.unstack('Charge Description', fill_value=0)
.rename_axis(columns = None)
.reset_index()
)
   ID  Net Cost Discount X Discount Y Surcharge A Surcharge B
0   1        30      -11.5      -3.25         9.5         2.5
1   2        40      -12.5          0           0           0
2   3        50      -11.5          0           0           0
3   4        35       -5.5          0           0         3.5
4   5        45      -10.5          0         9.5         4.5

You can convert the data dtypes for Discount to numeric

英文:

I used the file you shared, and edited the columns with the initial dataframe df shared (Pandas automatically adds suffixes to columns to make them unique) to keep the non uniqueness:

invoice = pd.read_csv(&#39;Downloads/Example Invoice.csv&#39;)
invoice.columns = [&#39;ID&#39;, &#39;Net Cost&#39;, &#39;Charge Description&#39;, &#39;Charge Amount&#39;,
                   &#39;Charge Description&#39;, &#39;Charge Amount&#39;, 
                   &#39;Charge Description&#39;, &#39;Charge Amount&#39;, 
                   &#39;Charge Description&#39;, &#39;Charge Amount&#39;]
print(invoice)
   ID  Net Cost Charge Description  Charge Amount  ... Charge Description  Charge Amount Charge Description  Charge Amount
0   1        30        Surcharge A            9.5  ...         Discount Y          -3.25        Surcharge B            2.5
1   2        40         Discount X          -12.5  ...                NaN            NaN                NaN            NaN
2   3        50         Discount X          -11.5  ...                NaN            NaN                NaN            NaN
3   4        35         Discount X           -5.5  ...                NaN            NaN                NaN            NaN
4   5        45        Surcharge A            9.5  ...        Surcharge B           4.50                NaN            NaN

# pip install pyjanitor
import pandas as pd
import janitor
index = [&#39;ID&#39;, &#39;Net Cost&#39;]
arr = [&#39;Charge Description&#39;, &#39;Charge Amount&#39;]
(invoice
.pivot_longer(
    index = index, 
    names_to = arr, 
    names_pattern = arr, 
    dropna=True)
.pivot_wider(
    index=index,
    names_from=&#39;Charge Description&#39;, 
    values_from=&#39;Charge Amount&#39;)
.fillna(0)
)
   ID  Net Cost  Discount X  Discount Y  Surcharge A  Surcharge B
0   1        30       -11.5       -3.25          9.5          2.5
1   2        40       -12.5        0.00          0.0          0.0
2   3        50       -11.5        0.00          0.0          0.0
3   4        35        -5.5        0.00          0.0          3.5
4   5        45       -10.5        0.00          9.5          4.5

index = [&#39;ID&#39;, &#39;Net Cost&#39;]
arr = [&#39;Charge Description&#39;, &#39;Charge Amount&#39;]
invoice = invoice.set_index(index)
out = invoice.to_numpy().reshape(-1, 2)
out = pd.DataFrame(out, columns = arr)
# reshape above is in order `C` - default
# so we can safely repeat the index
# with a value of 4
# which is what you get -&gt; 
# invoice.columns.size // 2
# to correctly pair the index with the new dataframe
out.index = invoice.index.repeat(invoice.columns.size//2)
# get rid of nulls, and flip to wide form
(out
.dropna(how=&#39;all&#39;)
.set_index(&#39;Charge Description&#39;, append=True)
.squeeze()
.unstack(&#39;Charge Description&#39;, fill_value=0)
.rename_axis(columns = None)
.reset_index()
)
   ID  Net Cost Discount X Discount Y Surcharge A Surcharge B
0   1        30      -11.5      -3.25         9.5         2.5
1   2        40      -12.5          0           0           0
2   3        50      -11.5          0           0           0
3   4        35       -5.5          0           0         3.5
4   5        45      -10.5          0         9.5         4.5

You can convert the data dtypes for Discount to numeric

答案2

得分: 0

你可以首先使用 melt 函数来展开你的数据框，然后在清理后使用 pivot_table 函数来重新塑造它：

# 第一步
out = (pd.DataFrame(df.iloc[1:].values, columns=df.iloc[0].tolist())
         .melt(['ID', 'Net Cost'], ignore_index=False))
m = out['variable'] == 'Charge Description'
# 第二步
out = (pd.concat([out[m].reset_index(drop=True).add_prefix('_'),
                  out[~m].reset_index(drop=True)], axis=1)
         .query("_value != ''")
         .pivot_table(index=['ID', 'Net Cost'], columns='_value',
                      values='value', aggfunc='first')
         .rename_axis(columns=None).reset_index().fillna(0))

输出：

>>> out
  ID Net Cost Discount X Discount Y Surcharge A Surcharge B
0  1       30      -11.5      -3.25         9.5         2.5
1  2       40      -12.5          0           0           0
2  3       50      -11.5          0           0           0
3  4       35       -5.5          0           0         3.5
4  5       45      -10.5          0         9.5         4.5

英文:

You can flatten your dataframe first with melt then reshape with pivot_table after cleaning it up:

# 1st pass
out = (pd.DataFrame(df.iloc[1:].values, columns=df.iloc[0].tolist())
         .melt([&#39;ID&#39;, &#39;Net Cost&#39;], ignore_index=False))
m = out[&#39;variable&#39;] == &#39;Charge Description&#39;
# 2nd pass
out = (pd.concat([out[m].reset_index(drop=True).add_prefix(&#39;_&#39;),
                  out[~m].reset_index(drop=True)], axis=1)
         .query(&quot;_value != &#39;&#39;&quot;)
         .pivot_table(index=[&#39;ID&#39;, &#39;Net Cost&#39;], columns=&#39;_value&#39;,
                      values=&#39;value&#39;, aggfunc=&#39;first&#39;)
         .rename_axis(columns=None).reset_index().fillna(0))

Output:

&gt;&gt;&gt; out
  ID Net Cost Discount X Discount Y Surcharge A Surcharge B
0  1       30      -11.5      -3.25         9.5         2.5
1  2       40      -12.5          0           0           0
2  3       50      -11.5          0           0           0
3  4       35       -5.5          0           0         3.5
4  5       45      -10.5          0         9.5         4.5

答案3

得分: 0

您可以在连接成对的数据后使用 pivot_table：

import pandas as pd
df = pd.DataFrame.from_dict(
    {0: {0: 'ID', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}, 1: {0: 'Net Cost', 1: '30', 2: '40', 3: '50', 4: '35', 5: '45'}, 2: {0: 'Charge Description', 1: 'Surcharge A', 2: 'Discount X', 3: 'Discount X', 4: 'Discount X', 5: 'Surcharge A'}, 3: {0: 'Charge Amount', 1: '9.5', 2: '-12.5', 3: '-11.5', 4: '-5.5', 5: '9.5'}, 4: {0: 'Charge Description', 1: 'Discount X', 2: '', 3: '', 4: 'Surcharge B', 5: 'Discount X'}, 5: {0: 'Charge Amount', 1: '-11.5', 2: '', 3: '', 4: '3.5', 5: '-10.5'}, 6: {0: 'Charge Description', 1: 'Discount Y', 2: '', 3: '', 4: '', 5: 'Surcharge B'}, 7: {0: 'Charge Amount', 1: '-3.25', 2: '', 3: '', 4: '', 5: '4.5'}, 8: {0: 'Charge Description', 1: 'Surcharge B', 2: '', 3: '', 4: '', 5: ''}, 9: {0: 'Charge Amount', 1: '2.5', 2: '', 3: '', 4: '', 5: ''}})
# 设置第一行为标题
df.columns = df.iloc[0, :]
df.drop(index=0, inplace=True)
df = pd.concat([df.iloc[:, [0, 1, i, i + 1]] for i in range(2, len(df.columns), 2)]).replace('', 0)
print(df[df['Charge Description'] != 0]
    .pivot_table(columns='Charge Description', values='Charge Amount', index=['ID', 'Net Cost'])
    .fillna(0))

输出：

Charge Description  Discount X  Discount Y  Surcharge A  Surcharge B
ID Net Cost                                                         
1  30                    -11.5       -3.25          9.5          2.5
2  40                    -12.5        0.00          0.0          0.0
3  50                    -11.5        0.00          0.0          0.0
4  35                     -5.5        0.00          0.0          3.5
5  45                    -10.5        0.00          9.5          4.5

英文:

You can use pivot_table after concatenating pair-wise:

import pandas as pd
df = pd.DataFrame.from_dict(
{0: {0: &#39;ID&#39;, 1: &#39;1&#39;, 2: &#39;2&#39;, 3: &#39;3&#39;, 4: &#39;4&#39;, 5: &#39;5&#39;}, 1: {0: &#39;Net Cost&#39;, 1: &#39;30&#39;, 2: &#39;40&#39;, 3: &#39;50&#39;, 4: &#39;35&#39;, 5: &#39;45&#39;}, 2: {0: &#39;Charge Description&#39;, 1: &#39;Surcharge A&#39;, 2: &#39;Discount X&#39;, 3: &#39;Discount X&#39;, 4: &#39;Discount X&#39;, 5: &#39;Surcharge A&#39;}, 3: {0: &#39;Charge Amount&#39;, 1: &#39;9.5&#39;, 2: &#39;-12.5&#39;, 3: &#39;-11.5&#39;, 4: &#39;-5.5&#39;, 5: &#39;9.5&#39;}, 4: {0: &#39;Charge Description&#39;, 1: &#39;Discount X&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;Surcharge B&#39;, 5: &#39;Discount X&#39;}, 5: {0: &#39;Charge Amount&#39;, 1: &#39;-11.5&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;3.5&#39;, 5: &#39;-10.5&#39;}, 6: {0: &#39;Charge Description&#39;, 1: &#39;Discount Y&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;Surcharge B&#39;}, 7: {0: &#39;Charge Amount&#39;, 1: &#39;-3.25&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;4.5&#39;}, 8: {0: &#39;Charge Description&#39;, 1: &#39;Surcharge B&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;&#39;}, 9: {0: &#39;Charge Amount&#39;, 1: &#39;2.5&#39;, 2: &#39;&#39;, 3: &#39;&#39;, 4: &#39;&#39;, 5: &#39;&#39;}})
# setting first row as header
df.columns = df.iloc[0, :]
df.drop(index=0, inplace=True)
df = pd.concat([df.iloc[:, [0,1,i,i+1]] for i in range(2, len(df.columns), 2)]).replace(&#39;&#39;, 0)
print(df[df[&#39;Charge Description&#39;]!=0]
.pivot_table(columns=&#39;Charge Description&#39;, values=&#39;Charge Amount&#39;, index=[&#39;ID&#39;, &#39;Net Cost&#39;])
.fillna(0))

Output:

Charge Description  Discount X  Discount Y  Surcharge A  Surcharge B
ID Net Cost                                                         
1  30                    -11.5       -3.25          9.5          2.5
2  40                    -12.5        0.00          0.0          0.0
3  50                    -11.5        0.00          0.0          0.0
4  35                     -5.5        0.00          0.0          3.5
5  45                    -10.5        0.00          9.5          4.5

答案4

得分: 0

我将使用melt函数来堆叠具有相同名称的列，然后使用pivot函数来创建你想要的结果。

# 确保第一行现在是列名，然后删除第一行。
df.columns = df.iloc[0]
df = df[1:]
# 创建两个已经melt的数据框，并按索引将它们连接起来。
df1 = df.melt(['ID', 'Net Cost'], ['Charge Description']).sort_values(by='ID').reset_index(drop=True)
df2 = df.melt(['ID', 'Net Cost'], ['Charge Amount']).sort_values(by='ID').reset_index(drop=True)
df1['Charge Amount'] = df2['value']
# 进行一些清理工作，重命名df1中添加的'value'列。
df1 = df1.drop(columns=[0]).rename(columns={'value': 'Charge Description'})
df1 = df1.dropna()
# 对数据进行透视。
df1 = df1.pivot(index=['ID', 'Net Cost'], columns='Charge Description', values='Charge Amount')

df1的结果如下：

Charge Description Discount X Discount Y Surcharge A Surcharge B
ID Net Cost                                                     
1  30                   -11.5      -3.25         9.5         2.5
2  40                   -12.5        NaN         NaN         NaN
3  50                   -11.5        NaN         NaN         NaN
4  35                    -5.5        NaN         NaN         3.5
5  45                   -10.5        NaN         9.5         4.5

英文:

I would use melt to stack the identically named columns, then pivot to create the outcome you want.

# Ensure the first line is now the column names, and then delete the first line.
df.columns = df.iloc[0]
df = df[1:]
# Create two melted df&#39;s, and join them on index.
df1 = df.melt([&#39;ID&#39;, &#39;Net Cost&#39;], [&#39;Charge Description&#39;]).sort_values(by=&#39;ID&#39;).reset_index(drop=True)
df2 = df.melt([&#39;ID&#39;, &#39;Net Cost&#39;], [&#39;Charge Amount&#39;]).sort_values(by=&#39;ID&#39;).reset_index(drop=True)
df1[&#39;Charge Amount&#39;] = df2[&#39;value&#39;]
# Clean up a little, rename the added &#39;value&#39; column from df1. 
df1 = df1.drop(columns=[0]).rename(columns={&#39;value&#39;: &#39;Charge Description&#39;})
df1 = df1.dropna()
# Pivot the data.
df1 = df1.pivot(index=[&#39;ID&#39;, &#39;Net Cost&#39;], columns=&#39;Charge Description&#39;, values=&#39;Charge Amount&#39;)

Result of df1:

Charge Description Discount X Discount Y Surcharge A Surcharge B
ID Net Cost                                                     
1  30                   -11.5      -3.25         9.5         2.5
2  40                   -12.5        NaN         NaN         NaN
3  50                   -11.5        NaN         NaN         NaN
4  35                    -5.5        NaN         NaN         3.5
5  45                   -10.5        NaN         9.5         4.5`

答案5

得分: 0

以下是翻译好的部分：

"My first thought was to read the data out into a list of dictionaries representing each Row (making both the keys and values from the data values), then form a new dataframe from that."（我最初的想法是将数据读取到一个字典列表中，每个字典代表一行（从数据值中同时生成键和值），然后从中创建一个新的数据框。）

"For your example, that would make..."（对于你的示例，这将产生...）

"For the SMALL sample dataset, using comprehensions appears to be quite quick for that..."（对于小规模示例数据集，使用推导似乎相当快速...）

"To sort the column names, add the following..."（要对列名进行排序，请添加以下内容...）

"df2 = df2[['ID', 'Net Cost', *sorted(df2.columns[2:])]]"（df2 = df2[['ID', 'Net Cost', *sorted(df2.columns[2:])]]）

英文:

My first thought was to read the data out in to a list of dictionaries representing each Row (making both the keys and values from the data values), then form a new dataframe from that.

For your example, that would make...

[
{
&#39;ID&#39;: &#39;1&#39;,
&#39;Net Cost&#39;: &#39;30&#39;,
&#39;Discount X&#39;: &#39;-11.5&#39;,
&#39;Discount Y&#39;: &#39;-3.25&#39;,
&#39;Surcharge A&#39;: &#39;9.5&#39;,
&#39;Surcharge B&#39;: &#39;2.5&#39;,
},
{
&#39;ID&#39;: &#39;2&#39;,
&#39;Net Cost&#39;: &#39;40&#39;,
&#39;Discount X&#39;: &#39;-12.5&#39;,
},
{
&#39;ID&#39;: &#39;3&#39;,
&#39;Net Cost&#39;: &#39;50&#39;,
&#39;Discount X&#39;: &#39;-11.5&#39;,
},
{
&#39;ID&#39;: &#39;4&#39;,
&#39;Net Cost&#39;: &#39;35&#39;,
&#39;Discount X&#39;: &#39;-5.5&#39;,
&#39;Surcharge B&#39;: &#39;3.5&#39;,
},
{
&#39;ID&#39;: &#39;5&#39;,
&#39;Net Cost&#39;: &#39;45&#39;,
&#39;Discount X&#39;: &#39;-10.5&#39;,
&#39;Surcharge A&#39;: &#39;9.5&#39;,
&#39;Surcharge B&#39;: &#39;4.5&#39;,
},
]

For the SMALL sample dataset, using comprehensions appears to be quite quick for that...

import pandas as pd
from itertools import chain
rows = [
{
name: value
for name, value in chain(
[
(&quot;ID&quot;, row[0]),
(&quot;Net Cost&quot;, row[1]),
],
zip(row[2::2], row[3::2]) # pairs of columns: (2,3), (4,5), etc
)
if name
}
for ix, row in df.iloc[1:].iterrows() # Skips the row with the column headers
]
df2 = pd.DataFrame(rows).fillna(0)

Demo (including timings of this and three other answers):

https://trinket.io/python3/555f860855

EDIT:

To sort the column names, add the following...

df2 = df2[[&#39;ID&#39;, &#39;Net Cost&#39;, *sorted(df2.columns[2:])]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

列逐对聚合和重组在Pandas中

问题

答案1

答案2

答案3

答案4

答案5

Python – 基于相似度超过80的结果，在新列中为类别进行模糊匹配

是不是可能将多个值输入到Python函数的单个变量中？

TDD 修改我的测试以使我的代码通过

附加行问题，通过Python循环抓取URL时发生

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论