英文:
Why is using random module in the pandas assign method returns the same number?
问题
以下是翻译好的代码部分:
这是一个小的代码片段,我认为它应该创建2个数据框列(*randomNumber, randomNumber2*),这些列填充有从1到100范围的均匀分布中选择的随机整数。
```python
import random
import numpy as np
import pandas as pd
n = 1000
df_test = pd.DataFrame(index=np.arange(n))
df_test = df_test.assign(randomNumber = random.randint(1,100))
df_test['randomNumber2'] = np.random.randint(1, 100, size=n)
print(df_test.shape)
df_test.head()
为什么在使用assign()方法和random模块中的函数时,所有数据框行的randomNumber列都具有相同的值?
我了解使用内置的numpy方法是填充数据框列的最快方法,但我无法理解为什么“assign”方法在这里不起作用。我认为在这种情况下,应该为每个df_test行调用random.randint(1,100)函数。
我也无法在pandas官方文档中找到答案。有人能解释发生了什么,我漏掉了什么,或者指向一个解释吗?
谢谢!
<details>
<summary>英文:</summary>
Here is a small code snippet which I though should create 2 dataframe columns (*randomNumber, randomNumber2*) which are populated with random integer choosen from uniform distribution in 1 to 100 range.
import random
import numpy as np
import pandas as pd
n = 1000
df_test = pd.DataFrame(index=np.arange(n))
df_test = df_test.assign(randomNumber = random.randint(1,100))
df_test['randomNumber2'] = np.random.randint(1, 100, size=n)
print(df_test.shape)
df_test.head()
[Output](https://i.stack.imgur.com/gdZH1.png)
Why, when using assign() method and a function from random module all dataframe rows have the same value of *randomNumber* column?
I understand that the quickest way to populate dataframe column with random numeric value is by using built-in numpy method, however I can't understand why the "assign" method doesn't work here. I thought that in this scenario random.randint(1,100) function should be called for each df_test row.
I also can't find the answer to my question in the pandas official documentation.
Could someone explain what's happening and what am I missing, or point me towards an explanation?
Thanks!
</details>
# 答案1
**得分**: 1
`random.randint`返回一个单一的整数,当你将整数赋给一列时,该单一整数会重复出现在每一行。
`np.random.randint`返回一个整数数组,这成为新的列值。相同的名称,不同的功能。但这对所有东西都是一样的。对整数使用 `+` 相当于两个整数相加。对于numpy数组,`+` 将整个数组相加。
注意,这与`.assign` 没有特定关系。如果你使用了 `df_test = df_test.assign(randomNumber = np.random.randint(1, 100, size=n))`,你会看到一列随机数字。
<details>
<summary>英文:</summary>
`random.randint` returns a single integer, and when you assign an integer to a column, that single integer is repeated for each row.
`np.random.randint` returns an array of integers which becomes the new column values. Same name, different functionality. But that's the same for everything. `+` to integers adds two integers. `+` for numpy arrays add the entire arrays.
Note, this doesn't have anything to do with `.assign` specifically. Had you `df_test = df_test.assign(randomNumber = np.random.randint(1, 100, size=n))`, you would have seen a column of random numbers.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论