2023年5月7日 23:23:35go评论77阅读模式

英文:

python - pandas: function to replicate the creation of a variable

问题

我正在尝试复制变量```aux_35```，因为我的数据库中有一些缺失值。这是数据集的一个小样本：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter

import warnings
warnings.filterwarnings('ignore')

df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})


基本上，```aux_35```从```pdt_050```中获取数据，并根据变量```tob```分配值。例如：当tob的数量等于1或0时，aux_35应该是数组pdt_050的第一个元素；当tob是大于pdt_050元素长度的数字时，aux_35应该等于pdt_050中的最后一个元素，就像第六行所示。

我正在制作复制该过程的函数：

def mmonths(df):
pdo = []
pdoriginal = df['pdt_050']
tob_y = df['aux_35'].astype(int)
for i in range(len(tob_y)):
tob = tob_y[i]
try:
pdo.append(pdoriginal[i][(tob)])
except:
pdo.append(pdoriginal[i][0])

return pdo

df['replica'] = mmonths(df)


但是，正如您在下面的图片中所看到的，它不好。能否请您帮助我？

英文:

I am trying to replicate the variable aux_35, because I have some missing values in my database. Here is a little sample of the dataset:

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter

import warnings
warnings.filterwarnings(&#39;ignore&#39;)


df = pd.DataFrame({&#39;pdt_050&#39;:[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
&#39;aux_35&#39;: [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
&#39;tob&#39;: [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})

Basically aux_35 take data from pdt_050 and assign the value based on the variable tob. For example: when the number of tob is equal to 1 or 0, aux_35 should be the first element of the array pdt_050 and when tob is a number that is higher than the length of elements on pdt_050, aux_35 should be equal to the last element in pdt_050; as you can see on the row number six.

I was making the function to replicate that process:

def mmonths(df):
    pdo = []
    pdoriginal = df[&#39;pdt_050&#39;]
    tob_y = df[&#39;aux_35&#39;].astype(int)
    for i in range(len(tob_y)):
        tob = tob_y[i]
        try:
            pdo.append(pdoriginal[i][(tob)])
        except:
            pdo.append(pdoriginal[i][0])
            
    return pdo

df[&#39;replica&#39;]  = mmonths(df)

But, as you can see in the following pic, it is not good. Can you help me please?

Thanks!

答案1

得分: 0

Sure, here is the translated content:

让我们在列轴上应用自定义索引器函数

def indexer(a, i):
    return a[max(1, min(int(i), len(a))) - 1]

df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)

结果

                                                          pdt_050  tob    aux_35
0                                         [0.683522, 0.26141]    1  0.683522
1                                         [0.683522, 0.26141]    1  0.683522
2                                         [0.683522, 0.26141]    1  0.683522
3                              [0.726501, 0.373269, 0.159278]    1  0.726501
4                              [0.726501, 0.373269, 0.159278]    1  0.726501
5                              [0.596246, 0.288327, 0.120612]    1  0.596246
6                              [0.353175, 0.314364, 0.159139]   14  0.159139
7                                         [0.595886, 0.25835]    2  0.258350
8                                                  [0.582035]    1  0.582035
9                              [0.726501, 0.373269, 0.159278]    1  0.726501
10  [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597]    0  0.583463
11                   [0.751279, 0.436349, 0.248187, 0.110235]    1  0.751279

英文:

Lets apply a custom indexer function along column axis

def indexer(a, i):
    return a[max(1, min(int(i), len(a))) - 1]

df[&#39;aux_35&#39;] = df.apply(lambda s: indexer(s[&#39;pdt_050&#39;], s[&#39;tob&#39;]), axis=1)

Result

                                                      pdt_050  tob    aux_35
0                                         [0.683522, 0.26141]    1  0.683522
1                                         [0.683522, 0.26141]    1  0.683522
2                                         [0.683522, 0.26141]    1  0.683522
3                              [0.726501, 0.373269, 0.159278]    1  0.726501
4                              [0.726501, 0.373269, 0.159278]    1  0.726501
5                              [0.596246, 0.288327, 0.120612]    1  0.596246
6                              [0.353175, 0.314364, 0.159139]   14  0.159139
7                                         [0.595886, 0.25835]    2  0.258350
8                                                  [0.582035]    1  0.582035
9                              [0.726501, 0.373269, 0.159278]    1  0.726501
10  [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597]    0  0.583463
11                   [0.751279, 0.436349, 0.248187, 0.110235]    1  0.751279

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – pandas：复制创建变量的函数

问题

答案1

将开始日期和结束日期的数据框转换成一个时间段数组中的天数总和。

在 Web Worker 中的 onmessage 事件之前运行 Go-WebAssembly。

python3.9在Monterey M1上意外退出（分段错误）。

在循环内创建一个序列的 Python 数组？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论