使用嵌套字典和列表创建Panda DataFrame:dict:{dict:{dict:[list]}}

huangapple go评论68阅读模式
英文:

Create Panda DataFrame using nested dictionaries and a list: dict:{dict:{dict:[list]}}

问题

data = {
   "etherA": {
      "vlanY": {
         "local": ['mac01', 'mac02'],
         "external": ['mac03', 'mac02']
      }
   },
   "etherB": {
      "vlanZ": {
         "local": ['mac06', 'mac09'],
         "external": ['mac01', 'mac02', 'mac03']
      }
   }
}

import pandas as pd

# Create an empty DataFrame with the desired column names
df = pd.DataFrame(columns=['interface', 'vlan', 'dyn', 'mac-address'])

# Loop through the nested dictionary and flatten the data
for interface, nested_dict in data.items():
    for vlan, dyn_dict in nested_dict.items():
        for dyn, mac_list in dyn_dict.items():
            for mac in mac_list:
                df = df.append({'interface': interface, 'vlan': vlan, 'dyn': dyn, 'mac-address': mac}, ignore_index=True)

# Print the resulting DataFrame
print(df)

This code will create the DataFrame you desire from the nested dictionary without using multiple for loops.

英文:

I have a series of nested dicts with a list as the deepest value.

data = {
   "etherA": {
      "vlanY": {
         "local": ['mac01', 'mac02'],
         "external": ['mac03', 'mac02']
      }
   },
   "etherB": {
      "vlanZ": {
         "local": ['mac06', 'mac09'],
         "external": ['mac01', 'mac02', 'mac03']
      }
   }
} 

To load the dict into a dataframe, I create the column headers and then loop through the dict and add a list to the end of the dataframe.

df = pd.DataFrame.from_dict({
   'interface': [],
   'vlan': [],
   'dyn': [],
   'mac-address': []
})

for a in data:
   for b in data[a]:
      for c in data[a][b]:
         for d in data[a][b][c]:
            df.loc[len(df)] = [a, b, c, d]

Final output:


print(df)

  interface   vlan       dyn mac-address
0    etherA  vlanY     local       mac01
1    etherA  vlanY     local       mac02
2    etherA  vlanY  external       mac03
3    etherA  vlanY  external       mac02
4    etherB  vlanZ     local       mac06
5    etherB  vlanZ     local       mac09
6    etherB  vlanZ  external       mac01
7    etherB  vlanZ  external       mac02
8    etherB  vlanZ  external       mac03

The "for loops" ultimately do what I need it to, but is there a panda method for getting the data from the dict into the dataframe?

I've read through numerous other posts and have tried their answers and suggestions. Most are dealing with a single nested dictionary and none have dealt with a nested, nested, nested list. A few of the suggested questions are what I was trying to achieve and the answer was to loop through to essentially flatten the data before appending it to the dataframe,so that may be the best course.

答案1

得分: 1

以下是翻译好的代码部分:

import pandas as pd

data = {
   "etherA": {
      "vlanY": {
         "local": ['mac01', 'mac02'],
         "external": ['mac03', 'mac02']
      }
   },
   "etherB": {
      "vlanZ": {
         "local": ['mac06', 'mac09'],
         "external": ['mac01', 'mac02', 'mac03']
      }
   }
}

df = pd.DataFrame([
    {'interface': interface, 'vlan': vlan, 'dyn': dyn, 'mac-address': mac}
    for interface, vlan_dict in data.items()
    for vlan, dyn_dict in vlan_dict.items()
    for dyn, mac_list in dyn_dict.items()
    for mac in mac_list
])

这段代码生成的DataFrame如下:

 interface   vlan       dyn mac-address
0    etherA  vlanY     local       mac01
1    etherA  vlanY     local       mac02
2    etherA  vlanY  external       mac03
3    etherA  vlanY  external       mac02
4    etherB  vlanZ     local       mac06
5    etherB  vlanZ     local       mac09
6    etherB  vlanZ  external       mac01
7    etherB  vlanZ  external       mac02
8    etherB  vlanZ  external       mac03
英文:

Another way to do this is:

import pandas as pd

data = {
   "etherA": {
      "vlanY": {
         "local": ['mac01', 'mac02'],
         "external": ['mac03', 'mac02']
      }
   },
   "etherB": {
      "vlanZ": {
         "local": ['mac06', 'mac09'],
         "external": ['mac01', 'mac02', 'mac03']
      }
   }
}

df = pd.DataFrame([
    {'interface': interface, 'vlan': vlan, 'dyn': dyn, 'mac-address': mac}
    for interface, vlan_dict in data.items()
    for vlan, dyn_dict in vlan_dict.items()
    for dyn, mac_list in dyn_dict.items()
    for mac in mac_list
])

which gives

 interface   vlan       dyn mac-address
0    etherA  vlanY     local       mac01
1    etherA  vlanY     local       mac02
2    etherA  vlanY  external       mac03
3    etherA  vlanY  external       mac02
4    etherB  vlanZ     local       mac06
5    etherB  vlanZ     local       mac09
6    etherB  vlanZ  external       mac01
7    etherB  vlanZ  external       mac02
8    etherB  vlanZ  external       mac03
​

答案2

得分: 0

以下是代码部分的翻译:

我建议首先创建元组列表

L = [(a, b, c, d) for a in data
               for b in data[a]
               for c in data[a][b]
               for d in data[a][b][c]]
df = pd.DataFrame(L, columns=['interface', 'vlan', 'dyn', 'mac-address'])

或者

L = [(a, b, c, d) for a, d in data.items()
               for b, d1 in d.items()
               for c, d2 in d1.items()
               for d in d2]
df = pd.DataFrame(L, columns=['interface', 'vlan', 'dyn', 'mac-address'])

print(df)

  interface   vlan       dyn mac-address
0    etherA  vlanY     local       mac01
1    etherA  vlanY     local       mac02
2    etherA  vlanY  external       mac03
3    etherA  vlanY  external       mac02
4    etherB  vlanZ     local       mac06
5    etherB  vlanZ     local       mac09
6    etherB  vlanZ  external       mac01
7    etherB  vlanZ  external       mac02
8    etherB  vlanZ  external       mac03
英文:

I suggest create list of tuples first:

L = [(a,b,c,d) for a in data 
               for b in data[a] 
               for c in data[a][b] 
               for d in data[a][b][c]]
df = pd.DataFrame(L, columns=['interface','vlan','dyn','mac-address'])

Or:

L = [(a,b,c,d) for a, d in data.items()
               for b, d1 in d.items()
               for c, d2 in d1.items()
               for d in d2]
df = pd.DataFrame(L, columns=['interface','vlan','dyn','mac-address'])

print (df)

  interface   vlan       dyn mac-address
0    etherA  vlanY     local       mac01
1    etherA  vlanY     local       mac02
2    etherA  vlanY  external       mac03
3    etherA  vlanY  external       mac02
4    etherB  vlanZ     local       mac06
5    etherB  vlanZ     local       mac09
6    etherB  vlanZ  external       mac01
7    etherB  vlanZ  external       mac02
8    etherB  vlanZ  external       mac03

答案3

得分: 0

import pandas as pd

data = {
    "etherA": {
        "vlanY": {
            "local": ['mac01', 'mac02'],
            "external": ['mac03', 'mac02']
        }
    },
    "etherB": {
        "vlanZ": {
            "local": ['mac06', 'mac09'],
            "external": ['mac01', 'mac02', 'mac03']
        }
    }
}

df = pd.json_normalize(data, sep='_')
flatten_dict = df.to_dict(orient='records')[0]
res = []
for k, v in flatten_dict.items():
    for i in v:
        res.append(k.split("_")+[i])
res_df = pd.DataFrame(res, columns=["interface", "vlan", "dyn", "mac-address"])
print(res_df)
英文:

Firstly, you can flatten the nested dictionary using pd.json_normalize, then, you can build a list of lists and turn it into a DataFrame.

import pandas as pd

data = {
    "etherA": {
        "vlanY": {
            "local": ['mac01', 'mac02'],
            "external": ['mac03', 'mac02']
        }
    },
    "etherB": {
        "vlanZ": {
            "local": ['mac06', 'mac09'],
            "external": ['mac01', 'mac02', 'mac03']
        }
    }
}

df = pd.json_normalize(data, sep='_')
flatten_dict = df.to_dict(orient='records')[0]
res = []
for k, v in flatten_dict.items():
    for i in v:
        res.append(k.split("_")+[i])
res_df = pd.DataFrame(res, columns=["interface", "vlan", "dyn", "mac-address"])
print(res_df)

huangapple
  • 本文由 发表于 2023年2月27日 13:51:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75577131.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定