2023年6月9日 07:22:35go评论193阅读模式

英文:

Pandas: Convert group into list of jsons without using groupby or apply

问题

You can achieve the desired result more efficiently by using the groupby function along with agg to aggregate the data into lists of dictionaries. Here's a more optimized approach:

import pandas as pd
# Your item_dict and item_df here...
# Define a custom aggregation function to convert rows to dictionaries
def to_dict_records(group):
    return [dict(zip(group.columns, row)) for row in group.itertuples(index=False)]
# Group by the specified columns and aggregate the data
result_df = item_df.groupby(['BarCode_x', 'Extracted_Code', 'Unique_Code_x'], as_index=False).apply(to_dict_records)
result_df = result_df.reset_index(drop=True).rename(columns={0: 'Grouped'})
# Print the result_df
print(result_df)

This code should be significantly faster than your original approach, especially for a large dataframe, as it avoids the use of apply on the entire dataframe. It directly iterates through rows within each group and constructs the desired dictionaries.

英文:

I have an item dataframe such as:

item_dict = {
    &#39;index&#39;: [18, 24, 25, 26, 30, 31, 37, 38, 61, 62, 63, 67, 68, 69],
    &#39;BarCode_x&#39;: [&#39;12345678ABCD&#39;, &#39;12345678IJKL&#39;, &#39;12345678IJKL&#39;, &#39;12345678IJKL&#39;, &#39;12345678EFGH&#39;, &#39;12345678EFGH&#39;, &#39;67890123IJKL&#39;, &#39;67890123IJKL&#39;, &#39;67890123ABCD&#39;, &#39;67890123ABCD&#39;, &#39;67890123ABCD&#39;, &#39;67890123EFGH&#39;, &#39;67890123EFGH&#39;, &#39;67890123EFGH&#39;],
    &#39;Extracted_Code&#39;: [&#39;12345678&#39;, &#39;12345678&#39;, &#39;12345678&#39;, &#39;12345678&#39;, &#39;12345678&#39;, &#39;12345678&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;, &#39;67890123&#39;],
    &#39;Description_x&#39;: [&#39;Apples&#39;, &#39;Mangoes&#39;, &#39;Mangoes&#39;, &#39;Mangoes&#39;, &#39;Oranges&#39;, &#39;Oranges&#39;, &#39;Oats&#39;, &#39;Oats&#39;, &#39;Yoghurt&#39;, &#39;Yoghurt&#39;, &#39;Yoghurt&#39;, &#39;Cookies&#39;, &#39;Cookies&#39;, &#39;Cookies&#39;],
    &#39;Unique_Code_x&#39;: [&#39;EFG&#39;, &#39;LMO&#39;, &#39;LMO&#39;, &#39;LMO&#39;, &#39;JKL&#39;, &#39;JKL&#39;, &#39;OPZ&#39;, &#39;OPZ&#39;, &#39;YQA&#39;, &#39;YQA&#39;, &#39;YQA&#39;, &#39;CDF&#39;, &#39;CDF&#39;, &#39;CDF&#39;],
    &#39;Category_x&#39;: [&#39;M&#39;, &#39;S&#39;, &#39;S&#39;, &#39;S&#39;, &#39;T&#39;, &#39;T&#39;, &#39;F&#39;, &#39;F&#39;, &#39;M&#39;, &#39;M&#39;, &#39;M&#39;, &#39;M&#39;, &#39;M&#39;, &#39;M&#39;],
    &#39;Code_x&#39;: [1, 4, 4, 4, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4],
    &#39;Quantity_x&#39;: [52, 90, 90, 90, 11, 11, 90, 90, 52, 52, 52, 11, 11, 11],
    &#39;Price_x&#39;: [15.6, 67.0, 67.0, 67.0, 12.9, 12.9, 67.0, 67.0, 15.6, 15.6, 15.6, 12.9, 12.9, 12.9],
    &#39;BarCode&#39;: [&#39;12345678AAAA&#39;, &#39;12345678AAAA&#39;, &#39;12345678BBBB&#39;, &#39;12345678CCCC&#39;, &#39;12345678AAAA&#39;, &#39;12345678BBBB&#39;, &#39;67890123XXXX&#39;, &#39;67890123YYYY&#39;, &#39;67890123XXXX&#39;, &#39;67890123YYYY&#39;, &#39;67890123ZZZZ&#39;, &#39;67890123XXXX&#39;, &#39;67890123YYYY&#39;, &#39;67890123ZZZZ&#39;],
    &#39;Description&#39;: [&#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Fruits&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;, &#39;Snacks&#39;],
    &#39;Unique_Code&#39;: [&#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;, &#39;XYZ&#39;],
    &#39;Category&#39;: [&#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;, &#39;H&#39;],
    &#39;Code&#39;: [0, 0, 2, 3, 0, 2, 0, 2, 0, 2, 3, 0, 2, 3],
    &#39;Quantity&#39;: [99, 99, 77, 10, 99, 77, 99, 77, 99, 77, 10, 99, 77, 10],
    &#39;Price&#39;: [12.0, 12.0, 10.5, 11.0, 12.0, 10.5, 12.0, 10.5, 12.0, 10.5, 11.0, 12.0, 10.5, 11.0]
}
item_df = pd.DataFrame(item_dict)

I am trying to group the dataframe based on ['BarCode_x', 'Extracted_Code', 'Unique_Code_x'], convert each group into a list of jsons and store it in a new column Grouped. My desired result is:

BarCode_x	    Extracted_Code	Unique_Code_x	Grouped
12345678ABCD	12345678	    EFG	            [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}]
12345678EFGH	12345678	    JKL	            [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]
12345678IJKL	12345678	    LMO	            [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;12345678CCCC&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
67890123ABCD	67890123	    YQA	            [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
67890123EFGH	67890123	    CDF	            [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
67890123IJKL	67890123	    OPZ	            [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]

This is what I have done:

item_df.groupby([&#39;BarCode_x&#39;, &#39;Extracted_Code&#39;, &#39;Unique_Code_x&#39;])[[&quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;, &quot;Code&quot;, &quot;Quantity&quot;, &quot;Price&quot;]].apply(lambda group: group.to_dict(&quot;records&quot;)).reset_index(name=&quot;Grouped&quot;)

The item_df shown above is a small representation of another dataframe that contains over 3 million records. When I apply the above logic using groupby+apply, the process takes 2 hours to complete, which is not feasible. Therefore, is there any way I can achieve the same result in a shorter amount of time using another optimized method instead of using groupby+apply?

答案1

得分: 1

你可以尝试：

out = {}
for a, b, c, *d in zip(
    item_df.BarCode_x,
    item_df.Extracted_Code,
    item_df.Unique_Code_x,
    item_df.BarCode,
    item_df.Description,
    item_df.Category,
    item_df.Code,
    item_df.Quantity,
    item_df.Price,
):
    out.setdefault((a, b, c), []).append(d)
ks = ["BarCode", "Description", "Category", "Code", "Quantity", "Price"]
df = pd.DataFrame(
    ((*k, [dict(zip(ks, l)) for l in v]) for k, v in out.items()),
    columns=["BarCode_x", "Extracted_Code", "Unique_Code_x", "Grouped"],
)
print(df)

打印结果：

      BarCode_x Extracted_Code Unique_Code_x                                                                                                                                                                                                                                                                                                                                              Grouped
0  12345678ABCD       12345678           EFG                                                                                                                                                                                                                                    [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}]
1  12345678IJKL       12345678           LMO  [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}, {'BarCode': '12345678BBBB', 'Description': 'Fruits', 'Category': 'H', 'Code': 2, 'Quantity': 77, 'Price': 10.5}, {'BarCode': '12345678CCCC', 'Description': 'Fruits', 'Category': 'H', 'Code': 3, 'Quantity': 10, 'Price': 11.0}]
2  12345678EFGH       12345678           JKL                                                                                                                   [{'BarCode': '12345678AAAA', 'Description': 'Fruits', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}, {'BarCode': '12345678BBBB', 'Description': 'Fruits', 'Category': 'H', 'Code': 2, 'Quantity': 77, 'Price': 10.5}]
3  67890123IJKL       67890123           OPZ                                                                                                                   [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}, {'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': 2, 'Quantity': 77, 'Price': 10.5}]
4  67890123ABCD       67890123           YQA  [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}, {'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': 2, 'Quantity': 77, 'Price': 10.5}, {'BarCode': '67890123ZZZZ', 'Description': 'Snacks', 'Category': 'H', 'Code': 3, 'Quantity': 10, 'Price': 11.0}]
5  67890123EFGH       67890123           CDF  [{'BarCode': '67890123XXXX', 'Description': 'Snacks', 'Category': 'H', 'Code': 0, 'Quantity': 99, 'Price': 12.0}, {'BarCode': '67890123YYYY', 'Description': 'Snacks', 'Category': 'H', 'Code': 2, 'Quantity': 77, 'Price': 10.5}, {'BarCode': '67890123ZZZZ', 'Description': 'Snacks', 'Category': 'H', 'Code': 3, 'Quantity': 10, 'Price': 11.0}]

快速性能测试：

from timeit import timeit
def fn1(item_df):
    x = (
        item_df.groupby(["BarCode_x", "Extracted_Code", "Unique_Code_x"])[
            ["BarCode", "Description", "Category", "Code", "Quantity", "Price"]
        ]
        .apply(lambda group: group.to_dict("records"))
        .reset_index(name="Grouped")
    )
    return x
def fn2(item_df):
    out = {}
    for a, b, c, *d in zip(
        item_df.BarCode_x,
        item_df.Extracted_Code,
        item_df.Unique_Code_x,
        item_df.BarCode,
        item_df.Description,
        item_df.Category,
        item_df.Code,
        item_df.Quantity,
        item_df.Price,
    ):
        out.setdefault((a, b, c), []).append(d)
    ks = ["BarCode", "Description", "Category", "Code", "Quantity", "Price"]
    return pd.DataFrame(
        ((*k, [dict(zip(ks, l)) for l in v]) for k, v in out.items()),
        columns=["BarCode_x", "Extracted_Code", "Unique_Code_x", "Grouped"],
    )
t1 = timeit('fn1(x)', 'x=item_df.copy()',number=1000, globals=globals())
t2 = timeit('fn2(x)', 'x=item_df.copy()',number=1000, globals=globals())
print(t1)
print(t2)

在我的机器上（AMD 5700X/Ubuntu 20.04/Python 3.10.9/Pandas 1.5.3）打印结果为：

2.507308159954846
0.1901476769708097

因此，它应该快约12倍。

英文:

You can try:

out = {}
for a, b, c, *d in zip(
    item_df.BarCode_x,
    item_df.Extracted_Code,
    item_df.Unique_Code_x,
    item_df.BarCode,
    item_df.Description,
    item_df.Category,
    item_df.Code,
    item_df.Quantity,
    item_df.Price,
):
    out.setdefault((a, b, c), []).append(d)
ks = [&quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;, &quot;Code&quot;, &quot;Quantity&quot;, &quot;Price&quot;]
df = pd.DataFrame(
    ((*k, [dict(zip(ks, l)) for l in v]) for k, v in out.items()),
    columns=[&quot;BarCode_x&quot;, &quot;Extracted_Code&quot;, &quot;Unique_Code_x&quot;, &quot;Grouped&quot;],
)
print(df)

Prints:

      BarCode_x Extracted_Code Unique_Code_x                                                                                                                                                                                                                                                                                                                                              Grouped
0  12345678ABCD       12345678           EFG                                                                                                                                                                                                                                    [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}]
1  12345678IJKL       12345678           LMO  [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;12345678CCCC&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
2  12345678EFGH       12345678           JKL                                                                                                                   [{&#39;BarCode&#39;: &#39;12345678AAAA&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;12345678BBBB&#39;, &#39;Description&#39;: &#39;Fruits&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]
3  67890123IJKL       67890123           OPZ                                                                                                                   [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}]
4  67890123ABCD       67890123           YQA  [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]
5  67890123EFGH       67890123           CDF  [{&#39;BarCode&#39;: &#39;67890123XXXX&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 0, &#39;Quantity&#39;: 99, &#39;Price&#39;: 12.0}, {&#39;BarCode&#39;: &#39;67890123YYYY&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 2, &#39;Quantity&#39;: 77, &#39;Price&#39;: 10.5}, {&#39;BarCode&#39;: &#39;67890123ZZZZ&#39;, &#39;Description&#39;: &#39;Snacks&#39;, &#39;Category&#39;: &#39;H&#39;, &#39;Code&#39;: 3, &#39;Quantity&#39;: 10, &#39;Price&#39;: 11.0}]

Quick benchmark:

from timeit import timeit
def fn1(item_df):
    x = (
        item_df.groupby([&quot;BarCode_x&quot;, &quot;Extracted_Code&quot;, &quot;Unique_Code_x&quot;])[
            [&quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;, &quot;Code&quot;, &quot;Quantity&quot;, &quot;Price&quot;]
        ]
        .apply(lambda group: group.to_dict(&quot;records&quot;))
        .reset_index(name=&quot;Grouped&quot;)
    )
    return x
def fn2(item_df):
    out = {}
    for a, b, c, *d in zip(
        item_df.BarCode_x,
        item_df.Extracted_Code,
        item_df.Unique_Code_x,
        item_df.BarCode,
        item_df.Description,
        item_df.Category,
        item_df.Code,
        item_df.Quantity,
        item_df.Price,
    ):
        out.setdefault((a, b, c), []).append(d)
    ks = [&quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;, &quot;Code&quot;, &quot;Quantity&quot;, &quot;Price&quot;]
    return pd.DataFrame(
        ((*k, [dict(zip(ks, l)) for l in v]) for k, v in out.items()),
        columns=[&quot;BarCode_x&quot;, &quot;Extracted_Code&quot;, &quot;Unique_Code_x&quot;, &quot;Grouped&quot;],
    )
t1 = timeit(&#39;fn1(x)&#39;, &#39;x=item_df.copy()&#39;,number=1000, globals=globals())
t2 = timeit(&#39;fn2(x)&#39;, &#39;x=item_df.copy()&#39;,number=1000, globals=globals())
print(t1)
print(t2)

Prints on my machine AMD 5700X/Ubuntu 20.04/Python 3.10.9/Pandas 1.5.3:

2.507308159954846
0.1901476769708097

So it should be ~12x faster.

答案2

得分: 1

以下是一个替代方法（避免使用GroupBy）：

from collections import OrderedDict
use_cols = [
    "BarCode_x", "Extracted_Code", "Unique_Code_x",  # Groupers
    "BarCode", "Description", "Category", "Code",    # Aggs (p1)
    "Quantity", "Price"                              # Aggs (p2)
]
def gby_records(df):
    data = {}
    for r in df[use_cols].to_dict("records"):
        key = tuple(r[col] for col in use_cols[:3])
        data.setdefault(key, []).append(OrderedDict((k, r[k]) for k in use_cols[3:]))
    for k, rs in data.items():
        yield {**dict(zip(use_cols[:3], k)), "Grouped": rs}
out = pd.DataFrame(gby_records(item_df))

输出：

	BarCode_x	Extracted_Code	Unique_Code_x	Grouped
0	12345678ABCD	12345678	EFG	[{'BarCode': '12345678AAAA', 'Description': 'F...
1	12345678IJKL	12345678	LMO	[{'BarCode': '12345678AAAA', 'Description': 'F...
2	12345678EFGH	12345678	JKL	[{'BarCode': '12345678AAAA', 'Description': 'F...
3	67890123IJKL	67890123	OPZ	[{'BarCode': '67890123XXXX', 'Description': 'S...
4	67890123ABCD	67890123	YQA	[{'BarCode': '67890123XXXX', 'Description': 'S...
5	67890123EFGH	67890123	CDF	[{'BarCode': '67890123XXXX', 'Description': 'S...

时间记录：

%%timeit # Animeartist 
5.84 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit # Timeless
1.28 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit # Andrej Kesely
612 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

英文:

Here is an alternative approach (that avoids using GroupBy) :

from collections import OrderedDict
use_cols = [
    &quot;BarCode_x&quot;, &quot;Extracted_Code&quot;, &quot;Unique_Code_x&quot;, # Groupers
    &quot;BarCode&quot;, &quot;Description&quot;, &quot;Category&quot;, &quot;Code&quot;,   # Aggs (p1)
    &quot;Quantity&quot;, &quot;Price&quot;                             # Aggs (p2)
]
def gby_records(df):
    data = {}
    for r in df[use_cols].to_dict(&quot;records&quot;):
        key = tuple(r[col] for col in use_cols[:3])
        data.setdefault(key, []).append(OrderedDict((k, r[k]) for k in use_cols[3:]))
    for k, rs in data.items():
        yield {**dict(zip(use_cols[:3], k)), &quot;Grouped&quot;: rs}
    
out = pd.DataFrame(gby_records(item_df))

Output :

	BarCode_x	Extracted_Code	Unique_Code_x	Grouped
0	12345678ABCD	12345678	EFG	[{'BarCode': '12345678AAAA', 'Description': 'F...
1	12345678IJKL	12345678	LMO	[{'BarCode': '12345678AAAA', 'Description': 'F...
2	12345678EFGH	12345678	JKL	[{'BarCode': '12345678AAAA', 'Description': 'F...
3	67890123IJKL	67890123	OPZ	[{'BarCode': '67890123XXXX', 'Description': 'S...
4	67890123ABCD	67890123	YQA	[{'BarCode': '67890123XXXX', 'Description': 'S...
5	67890123EFGH	67890123	CDF	[{'BarCode': '67890123XXXX', 'Description': 'S...

Timings :

&#129353; %%timeit #Animeartist 
5.84 ms &#177; 340 &#181;s per loop (mean &#177; std. dev. of 7 runs, 100 loops each)
&#129352; %%timeit #Timeless
1.28 ms &#177; 51.4 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)
&#129351; %%timeit #Andrej Kesely
612 &#181;s &#177; 23.8 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1,000 loops each)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas：将分组转换为 JSON 列表，不使用 groupby 或 apply。

问题

答案1

答案2

在使用ResNet18骨干网络训练YOLACT模型时出现错误。

使用OpenCV从轮廓构建嵌套掩码

Flutter – 异常的小部件对齐行为

Newtonsoft.JSON 抛出 SerializationException

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。