创建一个字典,其中值是表达式,而不进行评估或将其视为字符串。

huangapple go评论70阅读模式
英文:

Python | Create a Dictionary with Values as expressions without evaluating or treating it as string

问题

我需要在Python中创建一个动态合成的Pandas数据框,并使用Faker和随机函数来创建它。为此,我正在创建一个字典作为构建Pandas数据框的源,其中一些列将具有静态值,而其他列的值是基于用户输入的表达式或Faker对象生成的。

例如:

如果用户想要创建一个大小为5的合成数据框,其中包含2列,用户提供两个输入作为参数:

col1 = 1
col2 = tuple('1', '2')

那么字典应该构建如下:

dict1 = {
        'col1': 1,
        'col2': fake.random_element(elements=('1','2'))
     }

并且可以通过在循环中运行字典中的表达式来构建数据框,如下所示:

from faker import Faker  
import pandas as pd

fake = Faker()  

def generate_fake_data(numRows, inputDict):
    output = [
                 inputDict for x in range(numRows)
            ]
    
    return output

num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))

问题在于,如果我尝试创建字典而不让它预先评估col2的表达式,它会将值绑定为字符串,如下所示:

dict1 = {
        'col1': 1,
        'col2': 'fake.random_element(elements=('1','2'))'
     }

如何创建字典而不评估表达式列,并且不将它们视为字符串?

英文:

I need to create a dynamic synthetic Pandas DF in Python and am using Faker and Random functions to create the same.
To do so, I am creating a dictionary as a source to build pandas df where some columns will have static values whereas some column's values are generated through some expressions or Faker objects based on user input.\

for eg:

if user wants to create synthetic dataframe of size 5 with 2 columns, the user provides two inputs as arguments:

col1 = 1
col2 = tuple('1', '2')

so the dictionary should ideally be constructed as:

dict1 = {
        'col1': 1,
        'col2': fake.random_element(elements=('1','2'))
     }

And the Dataframe can be constructed by running the expressions within the dictionary in a loop, as:


from faker import Faker  
import pandas as pd

fake = Faker()  

def generate_fake_data(numRows, inputDict):
    output = [
                 inputDict for x in range(numRows)
            ]
    
    return output

num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))

The problem is that if I try to create the dictionary without letting it evaluate the expression for col2 beforehand, it binds the value as string like:

dict1 = {
        'col1': 1,
        'col2': 'fake.random_element(elements=('1','2'))'
     }

How can I create the dictionary without evaluating the expression columns and also not treating them as string ?

答案1

得分: 4

一种可能的解决方案是创建一个字典,其中键是列名,值是函数(lambda表达式):

import pandas as pd
from random import randint


def generate_fake_data(num_rows, input_dict):
    out = []
    for _ in range(num_rows):
        out.append({k: v() for k, v in input_dict.items()})
    return pd.DataFrame(out)


num_records = 5

dict1 = {
    'col1': lambda: 1,
    'col2': lambda: randint(1, 2)
}

df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)

打印结果:

   col1  col2
0     1     2
1     1     2
2     1     1
3     1     1
4     1     2
英文:

One possible solution is to create a dictionary where keys are column names and values are functions (lambdas):

import pandas as pd
from random import randint


def generate_fake_data(num_rows, input_dict):
    out = []
    for _ in range(num_rows):
        out.append({k: v() for k, v in input_dict.items()})
    return pd.DataFrame(out)


num_records = 5

dict1 = {
    'col1': lambda: 1,
    'col2': lambda: randint(1, 2)
}

df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)

Prints:

   col1  col2
0     1     2
1     1     2
2     1     1
3     1     1
4     1     2

答案2

得分: 2

使用@andrej-kesely在https://stackoverflow.com/a/76408359/218663中提供的答案以及您希望允许用户输入更自由形式的有效参数集的需求,我认为我们可以利用ast模块。

import pandas
import random
import ast

test_rows = 10
template = {}
while True:
    key = input("输入列名:")
    if not key:
        break

    value = input(f"输入列“{key}”的有效值列表[1,2,3]:")
    try:
        value = ast.literal_eval(value)
    except (ValueError, SyntaxError):
        pass

    template[key] = value if isinstance(value, list) else [value]

df_synthetic = pandas.DataFrame([
    {key: random.choice(value) for key, value in template.items()}
    for _ in range(test_rows)
])
print(df_synthetic)

当提示输入时:

col1
1
col2
[1,2]
<回车>

您应该会得到:

   col1  col2
0     1     2
1     1     1
2     1     1
3     1     2
4     1     1
5     1     1
6     1     1
7     1     2
8     1     1
9     1     2
英文:

Leveraging the answer by @andrej-kesely here https://stackoverflow.com/a/76408359/218663 and your desire to allow users to enter a more free form set of valid parameters, I believe we can leverage the ast module.

import pandas
import random
import ast

test_rows = 10
template = {}
while True:
    key = input(&quot;Enter a column name: &quot;)
    if not key:
        break

    value = input(f&quot;Enter a list [1,2,3] of valid values for column \&quot;{key}\&quot;: &quot;)
    try:
        value = ast.literal_eval(value)
    except (ValueError, SyntaxError):
        pass

    template[key] = value if isinstance(value, list) else [value]

df_synthetic = pandas.DataFrame([
    {key: random.choice(value) for key, value in template.items()}
    for _ in range(test_rows)
])
print(df_synthetic)

When prompted, the inputs:

col1
1
col2
[1,2]
&lt;return&gt;

You should get back:

   col1  col2
0     1     2
1     1     1
2     1     1
3     1     2
4     1     1
5     1     1
6     1     1
7     1     2
8     1     1
9     1     2

huangapple
  • 本文由 发表于 2023年6月6日 00:14:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76408258.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定