Python – pandas:复制创建变量的函数

huangapple go评论73阅读模式
英文:

python - pandas: function to replicate the creation of a variable

问题

我正在尝试复制变量```aux_35```,因为我的数据库中有一些缺失值。这是数据集的一个小样本:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter

import warnings
warnings.filterwarnings('ignore')


df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})


基本上,```aux_35```从```pdt_050```中获取数据,并根据变量```tob```分配值。例如:当tob的数量等于1或0时,aux_35应该是数组pdt_050的第一个元素;当tob是大于pdt_050元素长度的数字时,aux_35应该等于pdt_050中的最后一个元素,就像第六行所示。

我正在制作复制该过程的函数:

def mmonths(df):
pdo = []
pdoriginal = df['pdt_050']
tob_y = df['aux_35'].astype(int)
for i in range(len(tob_y)):
tob = tob_y[i]
try:
pdo.append(pdoriginal[i][(tob)])
except:
pdo.append(pdoriginal[i][0])

return pdo

df['replica'] = mmonths(df)


但是,正如您在下面的图片中所看到的,它不好。能否请您帮助我?
英文:

I am trying to replicate the variable aux_35, because I have some missing values in my database. Here is a little sample of the dataset:

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter

import warnings
warnings.filterwarnings('ignore')

df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})

Python – pandas:复制创建变量的函数

Basically aux_35 take data from pdt_050 and assign the value based on the variable tob. For example: when the number of tob is equal to 1 or 0, aux_35 should be the first element of the array pdt_050 and when tob is a number that is higher than the length of elements on pdt_050, aux_35 should be equal to the last element in pdt_050; as you can see on the row number six.

I was making the function to replicate that process:

def mmonths(df):
    pdo = []
    pdoriginal = df['pdt_050']
    tob_y = df['aux_35'].astype(int)
    for i in range(len(tob_y)):
        tob = tob_y[i]
        try:
            pdo.append(pdoriginal[i][(tob)])
        except:
            pdo.append(pdoriginal[i][0])
            
    return pdo

df['replica']  = mmonths(df)

But, as you can see in the following pic, it is not good. Can you help me please?

Python – pandas:复制创建变量的函数

Thanks!

答案1

得分: 0

Sure, here is the translated content:

让我们在列轴上应用自定义索引器函数

def indexer(a, i):
    return a[max(1, min(int(i), len(a))) - 1]

df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)

结果

                                                          pdt_050  tob    aux_35
0                                         [0.683522, 0.26141]    1  0.683522
1                                         [0.683522, 0.26141]    1  0.683522
2                                         [0.683522, 0.26141]    1  0.683522
3                              [0.726501, 0.373269, 0.159278]    1  0.726501
4                              [0.726501, 0.373269, 0.159278]    1  0.726501
5                              [0.596246, 0.288327, 0.120612]    1  0.596246
6                              [0.353175, 0.314364, 0.159139]   14  0.159139
7                                         [0.595886, 0.25835]    2  0.258350
8                                                  [0.582035]    1  0.582035
9                              [0.726501, 0.373269, 0.159278]    1  0.726501
10  [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597]    0  0.583463
11                   [0.751279, 0.436349, 0.248187, 0.110235]    1  0.751279
英文:

Lets apply a custom indexer function along column axis

def indexer(a, i):
    return a[max(1, min(int(i), len(a))) - 1]

df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)

Result

                                                      pdt_050  tob    aux_35
0                                         [0.683522, 0.26141]    1  0.683522
1                                         [0.683522, 0.26141]    1  0.683522
2                                         [0.683522, 0.26141]    1  0.683522
3                              [0.726501, 0.373269, 0.159278]    1  0.726501
4                              [0.726501, 0.373269, 0.159278]    1  0.726501
5                              [0.596246, 0.288327, 0.120612]    1  0.596246
6                              [0.353175, 0.314364, 0.159139]   14  0.159139
7                                         [0.595886, 0.25835]    2  0.258350
8                                                  [0.582035]    1  0.582035
9                              [0.726501, 0.373269, 0.159278]    1  0.726501
10  [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597]    0  0.583463
11                   [0.751279, 0.436349, 0.248187, 0.110235]    1  0.751279

huangapple
  • 本文由 发表于 2023年5月7日 23:23:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76194779.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定