英文:
python - pandas: function to replicate the creation of a variable
问题
我正在尝试复制变量```aux_35```,因为我的数据库中有一些缺失值。这是数据集的一个小样本:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter
import warnings
warnings.filterwarnings('ignore')
df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})
基本上,```aux_35```从```pdt_050```中获取数据,并根据变量```tob```分配值。例如:当tob的数量等于1或0时,aux_35应该是数组pdt_050的第一个元素;当tob是大于pdt_050元素长度的数字时,aux_35应该等于pdt_050中的最后一个元素,就像第六行所示。
我正在制作复制该过程的函数:
def mmonths(df):
pdo = []
pdoriginal = df['pdt_050']
tob_y = df['aux_35'].astype(int)
for i in range(len(tob_y)):
tob = tob_y[i]
try:
pdo.append(pdoriginal[i][(tob)])
except:
pdo.append(pdoriginal[i][0])
return pdo
df['replica'] = mmonths(df)
但是,正如您在下面的图片中所看到的,它不好。能否请您帮助我?
英文:
I am trying to replicate the variable aux_35
, because I have some missing values in my database. Here is a little sample of the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter
import warnings
warnings.filterwarnings('ignore')
df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})
Basically aux_35
take data from pdt_050
and assign the value based on the variable tob
. For example: when the number of tob is equal to 1 or 0, aux_35 should be the first element of the array pdt_050 and when tob is a number that is higher than the length of elements on pdt_050, aux_35 should be equal to the last element in pdt_050; as you can see on the row number six.
I was making the function to replicate that process:
def mmonths(df):
pdo = []
pdoriginal = df['pdt_050']
tob_y = df['aux_35'].astype(int)
for i in range(len(tob_y)):
tob = tob_y[i]
try:
pdo.append(pdoriginal[i][(tob)])
except:
pdo.append(pdoriginal[i][0])
return pdo
df['replica'] = mmonths(df)
But, as you can see in the following pic, it is not good. Can you help me please?
Thanks!
答案1
得分: 0
Sure, here is the translated content:
让我们在列轴上应用自定义索引器函数
def indexer(a, i):
return a[max(1, min(int(i), len(a))) - 1]
df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)
结果
pdt_050 tob aux_35
0 [0.683522, 0.26141] 1 0.683522
1 [0.683522, 0.26141] 1 0.683522
2 [0.683522, 0.26141] 1 0.683522
3 [0.726501, 0.373269, 0.159278] 1 0.726501
4 [0.726501, 0.373269, 0.159278] 1 0.726501
5 [0.596246, 0.288327, 0.120612] 1 0.596246
6 [0.353175, 0.314364, 0.159139] 14 0.159139
7 [0.595886, 0.25835] 2 0.258350
8 [0.582035] 1 0.582035
9 [0.726501, 0.373269, 0.159278] 1 0.726501
10 [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597] 0 0.583463
11 [0.751279, 0.436349, 0.248187, 0.110235] 1 0.751279
英文:
Lets apply a custom indexer function along column axis
def indexer(a, i):
return a[max(1, min(int(i), len(a))) - 1]
df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)
Result
pdt_050 tob aux_35
0 [0.683522, 0.26141] 1 0.683522
1 [0.683522, 0.26141] 1 0.683522
2 [0.683522, 0.26141] 1 0.683522
3 [0.726501, 0.373269, 0.159278] 1 0.726501
4 [0.726501, 0.373269, 0.159278] 1 0.726501
5 [0.596246, 0.288327, 0.120612] 1 0.596246
6 [0.353175, 0.314364, 0.159139] 14 0.159139
7 [0.595886, 0.25835] 2 0.258350
8 [0.582035] 1 0.582035
9 [0.726501, 0.373269, 0.159278] 1 0.726501
10 [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597] 0 0.583463
11 [0.751279, 0.436349, 0.248187, 0.110235] 1 0.751279
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论