使用pandas的assign方法中的random模块为什么返回相同的数字?

huangapple go评论62阅读模式
英文:

Why is using random module in the pandas assign method returns the same number?

问题

以下是翻译好的代码部分:

这是一个小的代码片段我认为它应该创建2个数据框列*randomNumber, randomNumber2*),这些列填充有从1到100范围的均匀分布中选择的随机整数

```python
import random
import numpy as np
import pandas as pd

n = 1000
df_test = pd.DataFrame(index=np.arange(n))

df_test = df_test.assign(randomNumber = random.randint(1,100))

df_test['randomNumber2']  = np.random.randint(1, 100, size=n)

print(df_test.shape)
df_test.head()

输出

为什么在使用assign()方法和random模块中的函数时,所有数据框行的randomNumber列都具有相同的值?

我了解使用内置的numpy方法是填充数据框列的最快方法,但我无法理解为什么“assign”方法在这里不起作用。我认为在这种情况下,应该为每个df_test行调用random.randint(1,100)函数。

我也无法在pandas官方文档中找到答案。有人能解释发生了什么,我漏掉了什么,或者指向一个解释吗?

谢谢!


<details>
<summary>英文:</summary>

Here is a small code snippet which I though should create 2 dataframe columns (*randomNumber, randomNumber2*) which are populated with random integer choosen from uniform distribution in 1 to 100 range.

import random
import numpy as np
import pandas as pd

n = 1000
df_test = pd.DataFrame(index=np.arange(n))

df_test = df_test.assign(randomNumber = random.randint(1,100))

df_test['randomNumber2'] = np.random.randint(1, 100, size=n)

print(df_test.shape)
df_test.head()


[Output](https://i.stack.imgur.com/gdZH1.png)

Why, when using assign() method and a function from random module all dataframe rows have the same value of *randomNumber* column?

I understand that the quickest way to populate dataframe column with random numeric value is by using built-in numpy method, however I can&#39;t understand why the &quot;assign&quot; method doesn&#39;t work here. I thought that in this scenario random.randint(1,100) function should be called for each df_test row.

I also can&#39;t find the answer to my question in the pandas official documentation.
Could someone explain what&#39;s happening and what am I missing, or point me towards an explanation?

Thanks!

</details>


# 答案1
**得分**: 1

`random.randint`返回一个单一的整数,当你将整数赋给一列时,该单一整数会重复出现在每一行。

`np.random.randint`返回一个整数数组,这成为新的列值。相同的名称,不同的功能。但这对所有东西都是一样的。对整数使用 `+` 相当于两个整数相加。对于numpy数组,`+` 将整个数组相加。

注意,这与`.assign` 没有特定关系。如果你使用了 `df_test = df_test.assign(randomNumber = np.random.randint(1, 100, size=n))`,你会看到一列随机数字。

<details>
<summary>英文:</summary>

`random.randint` returns a single integer, and when you assign an integer to a column, that single integer is repeated for each row. 
`np.random.randint` returns an array of integers which becomes the new column values. Same name, different functionality. But that&#39;s the same for everything. `+` to integers adds two integers. `+` for numpy arrays add the entire arrays.

Note, this doesn&#39;t have anything to do with `.assign` specifically. Had you `df_test = df_test.assign(randomNumber = np.random.randint(1, 100, size=n))`, you would have seen a column of random numbers.

</details>



huangapple
  • 本文由 发表于 2023年2月24日 03:43:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549609.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定