英文:
Python | Create a Dictionary with Values as expressions without evaluating or treating it as string
问题
我需要在Python中创建一个动态合成的Pandas数据框,并使用Faker和随机函数来创建它。为此,我正在创建一个字典作为构建Pandas数据框的源,其中一些列将具有静态值,而其他列的值是基于用户输入的表达式或Faker对象生成的。
例如:
如果用户想要创建一个大小为5的合成数据框,其中包含2列,用户提供两个输入作为参数:
col1 = 1
col2 = tuple('1', '2')
那么字典应该构建如下:
dict1 = {
'col1': 1,
'col2': fake.random_element(elements=('1','2'))
}
并且可以通过在循环中运行字典中的表达式来构建数据框,如下所示:
from faker import Faker
import pandas as pd
fake = Faker()
def generate_fake_data(numRows, inputDict):
output = [
inputDict for x in range(numRows)
]
return output
num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))
问题在于,如果我尝试创建字典而不让它预先评估col2的表达式,它会将值绑定为字符串,如下所示:
dict1 = {
'col1': 1,
'col2': 'fake.random_element(elements=('1','2'))'
}
如何创建字典而不评估表达式列,并且不将它们视为字符串?
英文:
I need to create a dynamic synthetic Pandas DF in Python and am using Faker and Random functions to create the same.
To do so, I am creating a dictionary as a source to build pandas df where some columns will have static values whereas some column's values are generated through some expressions or Faker objects based on user input.\
for eg:
if user wants to create synthetic dataframe of size 5 with 2 columns, the user provides two inputs as arguments:
col1 = 1
col2 = tuple('1', '2')
so the dictionary should ideally be constructed as:
dict1 = {
'col1': 1,
'col2': fake.random_element(elements=('1','2'))
}
And the Dataframe can be constructed by running the expressions within the dictionary in a loop, as:
from faker import Faker
import pandas as pd
fake = Faker()
def generate_fake_data(numRows, inputDict):
output = [
inputDict for x in range(numRows)
]
return output
num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))
The problem is that if I try to create the dictionary without letting it evaluate the expression for col2 beforehand, it binds the value as string like:
dict1 = {
'col1': 1,
'col2': 'fake.random_element(elements=('1','2'))'
}
How can I create the dictionary without evaluating the expression columns and also not treating them as string ?
答案1
得分: 4
一种可能的解决方案是创建一个字典,其中键是列名,值是函数(lambda表达式):
import pandas as pd
from random import randint
def generate_fake_data(num_rows, input_dict):
out = []
for _ in range(num_rows):
out.append({k: v() for k, v in input_dict.items()})
return pd.DataFrame(out)
num_records = 5
dict1 = {
'col1': lambda: 1,
'col2': lambda: randint(1, 2)
}
df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)
打印结果:
col1 col2
0 1 2
1 1 2
2 1 1
3 1 1
4 1 2
英文:
One possible solution is to create a dictionary where keys are column names and values are functions (lambdas):
import pandas as pd
from random import randint
def generate_fake_data(num_rows, input_dict):
out = []
for _ in range(num_rows):
out.append({k: v() for k, v in input_dict.items()})
return pd.DataFrame(out)
num_records = 5
dict1 = {
'col1': lambda: 1,
'col2': lambda: randint(1, 2)
}
df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)
Prints:
col1 col2
0 1 2
1 1 2
2 1 1
3 1 1
4 1 2
答案2
得分: 2
使用@andrej-kesely在https://stackoverflow.com/a/76408359/218663中提供的答案以及您希望允许用户输入更自由形式的有效参数集的需求,我认为我们可以利用ast
模块。
import pandas
import random
import ast
test_rows = 10
template = {}
while True:
key = input("输入列名:")
if not key:
break
value = input(f"输入列“{key}”的有效值列表[1,2,3]:")
try:
value = ast.literal_eval(value)
except (ValueError, SyntaxError):
pass
template[key] = value if isinstance(value, list) else [value]
df_synthetic = pandas.DataFrame([
{key: random.choice(value) for key, value in template.items()}
for _ in range(test_rows)
])
print(df_synthetic)
当提示输入时:
col1
1
col2
[1,2]
<回车>
您应该会得到:
col1 col2
0 1 2
1 1 1
2 1 1
3 1 2
4 1 1
5 1 1
6 1 1
7 1 2
8 1 1
9 1 2
英文:
Leveraging the answer by @andrej-kesely here https://stackoverflow.com/a/76408359/218663 and your desire to allow users to enter a more free form set of valid parameters, I believe we can leverage the ast
module.
import pandas
import random
import ast
test_rows = 10
template = {}
while True:
key = input("Enter a column name: ")
if not key:
break
value = input(f"Enter a list [1,2,3] of valid values for column \"{key}\": ")
try:
value = ast.literal_eval(value)
except (ValueError, SyntaxError):
pass
template[key] = value if isinstance(value, list) else [value]
df_synthetic = pandas.DataFrame([
{key: random.choice(value) for key, value in template.items()}
for _ in range(test_rows)
])
print(df_synthetic)
When prompted, the inputs:
col1
1
col2
[1,2]
<return>
You should get back:
col1 col2
0 1 2
1 1 1
2 1 1
3 1 2
4 1 1
5 1 1
6 1 1
7 1 2
8 1 1
9 1 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论