2023年6月9日 13:57:45go评论71阅读模式

英文:

Matching up array data to sub-array data

问题

I have a sub-array itemdata with a length of 6 rows. This data was originally in found in the main array but has been reformatted so there is 1 unique product per line.

我有一个子数组itemdata，长度为6行。这些数据最初在主数组中找到，但已经重新格式化，所以每行只有一个唯一的产品。

I have the main array saledata with a length of 4 rows which looks a bit like this:

我有一个主数组saledata，长度为4行，看起来有点像这样：

                id    sub-array
            0   001   [{'type': 'line_items', 'id': '78', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}}]
            1   002   [{'type': 'line_items', 'id': '80', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}}]
            2   003   [{'type': 'line_items', 'id': '85', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}}, {'type': 'line_items', 'id': '86', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}}]
            3   004   [{'type': 'line_items', 'id': '92', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}}, {'type': 'line_items', 'id': '93', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}}]

Then I have the sub-array itemdata (which is basically just json normalized column sub-array):

然后我有子数组itemdata（基本上只是json规范化的sub-array列）：

        type        id   attributes.status   attributes.quantity    attributes.various_other_data
    0   line_item   78   allocated           1                      etc
    0   line_item   80   allocated           2                      etc
    0   line_item   85   allocated           1                      etc
    1   line_item   86   allocated           1                      etc
    0   line_item   92   allocated           2                      etc
    1   line_item   93   allocated           2                      etc

At the moment, I'm treating sub-array as a string (after it's been json normalized for the second dataframe) which allows me to perform this:

目前，我将子数组视为一个字符串（在第二个数据框规范化为JSON之后），这使我能够执行以下操作：

for f in itemdata['id']:
    df['sub-array'].str.contains(f)

Which yields the following:

这产生了以下结果：

0     True
1    False
2    False
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1     True
2    False
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2     True
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2     True
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2    False
3     True
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2    False
3     True
Name: relationships.line_items.data, dtype: bool

Which is all correct! But now I'm trying to match up the sub-array to the parent array, matching the index of the above results to the initial array saledata where True but struggling to find the right way to do this.

这都是正确的！但现在我正在尝试将子数组与父数组匹配，将上述结果的索引与初始数组saledata匹配为True，但不知道如何正确做到这一点。

Python doesn't seem to like the below approach (truth value of a Series is ambiguous yada yada yada) and not sure how to proceed.

Python似乎不喜欢下面的方法（Series的真值是含糊的之类的），不确定该如何继续。

for f in itemdata['id']:
    if df['sub-array'].str.contains(f) == True:

Any advice greatly appreciated!

非常感谢任何建议！

Edit:

This is what I'm looking for (note the etc's are off & unsure of pandas will allow multiple rows to have the same index value - not a huge issue if not):

这是我正在寻找的（请注意etc的问题，不确定pandas是否允许多行具有相同的索引值 - 如果不允许，这不是一个大问题）：

             id   type         itemdata.id   itemdata.attributes.status   itemdata.attributes.quantity
        0   001   line_items   78            allocated              etc
        1   002   line_items   80            allocated              etc
        2   003   line_items   85            allocated              etc
        2   003   line_items   86            allocated              etc
        3   004   line_items   92            allocated              etc
        3   004   line_items   93            allocated              etc

英文:

I have a sub-array itemdata with a length of 6 rows. This data was originally in found in the main array but has been reformatted so there is 1 unique product per line.

I have the main array saledata with a length of 4 rows which looks a bit like this:

            id    sub-array
        0   001   [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;78&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}}]
        1   002   [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;80&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}}]
        2   003   [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;85&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}}, {&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;86&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}}]
        3   004   [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;92&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}}, {&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;93&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}}]

Then I have the sub-array itemdata (which is basically just json normalized column sub-array):

    type        id   attributes.status   attributes.quantity    attributes.various_other_data
0   line_item   78   allocated           1                      etc
0   line_item   80   allocated           2                      etc
0   line_item   85   allocated           1                      etc
1   line_item   86   allocated           1                      etc
0   line_item   92   allocated           2                      etc
1   line_item   93   allocated           2                      etc

At the moment, I'm treating sub-array as a string (after it's been json normalized for the second dataframe) which allows me to perform this:

for f in itemdata[&#39;id&#39;]:
    df[&#39;sub-array&#39;].str.contains(f)

Which yields the following:

0     True
1    False
2    False
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1     True
2    False
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2     True
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2     True
3    False
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2    False
3     True
Name: relationships.line_items.data, dtype: bool
0    False
1    False
2    False
3     True
Name: relationships.line_items.data, dtype: bool

Python doesn't seem to like the below approach (truth value of a Series is ambiguous yada yada yada) and not sure how to proceed.

for f in itemdata[&#39;id&#39;]:
    if df[&#39;sub-array&#39;].str.contains(f) == True:

Any advice greatly appreciated!

Edit:

This is what I'm looking for (note the etc's are off & unsure of pandas will allow multiple rows to have the same index value - not a huge issue if not):

         id   type         itemdata.id   itemdata.attributes.status   itemdata.attributes.quantity
    0   001   line_items   78            allocated              etc
    1   002   line_items   80            allocated              etc
    2   003   line_items   85            allocated              etc
    2   003   line_items   86            allocated              etc
    3   004   line_items   92            allocated              etc
    3   004   line_items   93            allocated              etc

答案1

得分: 1

你可以使用 DataFrame.join 来在规范化 sub-array 并通过 Series.explode 按行的方式设置索引后，将 id（或多个列）附加到它：

import ast

df['sub-array'] = df['sub-array'].apply(ast.literal_eval)

s = df['sub-array'].explode()

cols = ['id']
df = df[cols].add_suffix('_parent').join(pd.json_normalize(s).set_index(s.index))
print(df)
  id_parent        type  id attributes.status  attributes.quantity  \
0       001  line_items  78         allocated                    1   
1       002  line_items  80         allocated                    2   
2       003  line_items  85         allocated                    1   
2       003  line_items  86         allocated                    1   
3       004  line_items  92         allocated                    2   
3       004  line_items  93         allocated                    2   

  attributes.various_other_data  
0                           etc  
1                           etc  
2                           etc  
2                           etc  
3                           etc  
3                           etc

如果只需处理 id 列，且 id 值是唯一的，可以创建辅助 Series 并使用 Series.map：

s = df.set_index('id')['sub-array'].apply(ast.literal_eval).explode().str.get('id')
df['id_parent'] = df['id'].map(s)

英文:

You can use DataFrame.join if need append id (or multiple columns) after normalize sub-array with set indices by exploded rows by Series.explode:

import ast

df[&#39;sub-array&#39;] = df[&#39;sub-array&#39;].apply(ast.literal_eval)

s = df[&#39;sub-array&#39;].explode()

cols = [&#39;id&#39;]
df = df[cols].add_suffix(&#39;_parent&#39;).join(pd.json_normalize(s).set_index(s.index))
print (df)
  id_parent        type  id attributes.status  attributes.quantity  \
0       001  line_items  78         allocated                    1   
1       002  line_items  80         allocated                    2   
2       003  line_items  85         allocated                    1   
2       003  line_items  86         allocated                    1   
3       004  line_items  92         allocated                    2   
3       004  line_items  93         allocated                    2   

  attributes.various_other_data  
0                           etc  
1                           etc  
2                           etc  
2                           etc  
3                           etc  
3                           etc

If need processing only id column and id values are unique create helper Series and use Series.map:

s = df.set_index(&#39;id&#39;)[&#39;sub-array&#39;].apply(ast.literal_eval).explode().str.get(&#39;id&#39;)
df[&#39;id_parent&#39;] = df[&#39;id&#39;].map(s)

答案2

得分: 1

你可以通过以下方式生成主数组：

import pandas as pd

# 示例数据
saledata = pd.DataFrame({
    'id': ['001', '002', '003', '004'],
    'sub-array': [[{'type': 'line_items', 'id': '78', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}}],
                  [{'type': 'line_items', 'id': '80', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}}],
                  [{'type': 'line_items', 'id': '85', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}},
                   {'type': 'line_items', 'id': '86', 'attributes': {'status': 'allocated', 'quantity': 1, 'various_other_data': 'etc'}}],
                  [{'type': 'line_items', 'id': '92', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}},
                   {'type': 'line_items', 'id': '93', 'attributes': {'status': 'allocated', 'quantity': 2, 'various_other_data': 'etc'}}]
                 ]
})

itemdata = pd.DataFrame({
    'type': ['line_item', 'line_item', 'line_item', 'line_item', 'line_item', 'line_item'],
    'id': ['78', '80', '85', '86', '92', '93'],
    'attributes.status': ['allocated', 'allocated', 'allocated', 'allocated', 'allocated', 'allocated'],
    'attributes.quantity': [1, 2, 1, 1, 2, 2],
    'attributes.various_other_data': ['etc', 'etc', 'etc', 'etc', 'etc', 'etc']
})

import numpy as np

item_id2sale_ids = {i_id: np.where(df['sub-array'].apply(lambda x: any(item['id'] == i_id for item in x))) for i_id in itemdata['id']}

item_id2sale_ids

请注意，代码中的中文字符已经翻译并转换为英文字符。

英文:

So you can generate the main array by:

import pandas as pd

# Sample data
saledata = pd.DataFrame({
    &#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;, &#39;004&#39;],
    &#39;sub-array&#39;: [[{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;78&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}}],
                  [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;80&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}}],
                  [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;85&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}},
                   {&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;86&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 1, &#39;various_other_data&#39;: &#39;etc&#39;}}],
                  [{&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;92&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}},
                   {&#39;type&#39;: &#39;line_items&#39;, &#39;id&#39;: &#39;93&#39;, &#39;attributes&#39;: {&#39;status&#39;: &#39;allocated&#39;, &#39;quantity&#39;: 2, &#39;various_other_data&#39;: &#39;etc&#39;}}]
                 ]
})

itemdata = pd.DataFrame({
    &#39;type&#39;: [&#39;line_item&#39;, &#39;line_item&#39;, &#39;line_item&#39;, &#39;line_item&#39;, &#39;line_item&#39;, &#39;line_item&#39;],
    &#39;id&#39;: [&#39;78&#39;, &#39;80&#39;, &#39;85&#39;, &#39;86&#39;, &#39;92&#39;, &#39;93&#39;],
    &#39;attributes.status&#39;: [&#39;allocated&#39;, &#39;allocated&#39;, &#39;allocated&#39;, &#39;allocated&#39;, &#39;allocated&#39;, &#39;allocated&#39;],
    &#39;attributes.quantity&#39;: [1, 2, 1, 1, 2, 2],
    &#39;attributes.various_other_data&#39;: [&#39;etc&#39;, &#39;etc&#39;, &#39;etc&#39;, &#39;etc&#39;, &#39;etc&#39;, &#39;etc&#39;]
})

In [4]: import numpy as np                                                                                                 

In [5]: item_id2sale_ids = {i_id: np.where(df[&#39;sub-array&#39;].apply(lambda x: any(item[&#39;id&#39;] == i_id for item in x))) for i_id
   ...:  in itemdata[&#39;id&#39;]}                                                                                                

In [6]: item_id2sale_ids                                                                                                   
Out[6]: 
{&#39;78&#39;: (array([0], dtype=int32),),
 &#39;80&#39;: (array([1], dtype=int32),),
 &#39;85&#39;: (array([2], dtype=int32),),
 &#39;86&#39;: (array([2], dtype=int32),),
 &#39;92&#39;: (array([3], dtype=int32),),
 &#39;93&#39;: (array([3], dtype=int32),)}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

匹配数组数据与子数组数据

问题

答案1

答案2

寻找二进制列中的模式 r

如何在Windows 10上使用Python 3.11.1安装ruptures

将项目与来自其他列的值相关联。

Tkinter中冲突的默认和用户定义绑定之间的事件相关的优先级更改

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论