2023年6月6日 00:14:49go评论70阅读模式

英文:

Python | Create a Dictionary with Values as expressions without evaluating or treating it as string

问题

我需要在Python中创建一个动态合成的Pandas数据框，并使用Faker和随机函数来创建它。为此，我正在创建一个字典作为构建Pandas数据框的源，其中一些列将具有静态值，而其他列的值是基于用户输入的表达式或Faker对象生成的。

例如：

如果用户想要创建一个大小为5的合成数据框，其中包含2列，用户提供两个输入作为参数：

col1 = 1
col2 = tuple('1', '2')

那么字典应该构建如下：

dict1 = {
        'col1': 1,
        'col2': fake.random_element(elements=('1','2'))
     }

并且可以通过在循环中运行字典中的表达式来构建数据框，如下所示：

from faker import Faker  
import pandas as pd

fake = Faker()  

def generate_fake_data(numRows, inputDict):
    output = [
                 inputDict for x in range(numRows)
            ]
    
    return output

num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))

问题在于，如果我尝试创建字典而不让它预先评估col2的表达式，它会将值绑定为字符串，如下所示：

dict1 = {
        'col1': 1,
        'col2': 'fake.random_element(elements=('1','2'))'
     }

如何创建字典而不评估表达式列，并且不将它们视为字符串？

英文:

I need to create a dynamic synthetic Pandas DF in Python and am using Faker and Random functions to create the same.
To do so, I am creating a dictionary as a source to build pandas df where some columns will have static values whereas some column's values are generated through some expressions or Faker objects based on user input.\

for eg:

if user wants to create synthetic dataframe of size 5 with 2 columns, the user provides two inputs as arguments:

col1 = 1
col2 = tuple(&#39;1&#39;, &#39;2&#39;)

so the dictionary should ideally be constructed as:

dict1 = {
        &#39;col1&#39;: 1,
        &#39;col2&#39;: fake.random_element(elements=(&#39;1&#39;,&#39;2&#39;))
     }

And the Dataframe can be constructed by running the expressions within the dictionary in a loop, as:


from faker import Faker  
import pandas as pd

fake = Faker()  

def generate_fake_data(numRows, inputDict):
    output = [
                 inputDict for x in range(numRows)
            ]
    
    return output

num_records = 5
df_sythetic = pd.DataFrame(generate_fake_data(num_records, dict1))

The problem is that if I try to create the dictionary without letting it evaluate the expression for col2 beforehand, it binds the value as string like:

dict1 = {
        &#39;col1&#39;: 1,
        &#39;col2&#39;: &#39;fake.random_element(elements=(&#39;1&#39;,&#39;2&#39;))&#39;
     }

How can I create the dictionary without evaluating the expression columns and also not treating them as string ?

答案1

得分: 4

一种可能的解决方案是创建一个字典，其中键是列名，值是函数（lambda表达式）：

import pandas as pd
from random import randint


def generate_fake_data(num_rows, input_dict):
    out = []
    for _ in range(num_rows):
        out.append({k: v() for k, v in input_dict.items()})
    return pd.DataFrame(out)


num_records = 5

dict1 = {
    'col1': lambda: 1,
    'col2': lambda: randint(1, 2)
}

df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)

打印结果：

   col1  col2
0     1     2
1     1     2
2     1     1
3     1     1
4     1     2

英文:

One possible solution is to create a dictionary where keys are column names and values are functions (lambdas):

import pandas as pd
from random import randint


def generate_fake_data(num_rows, input_dict):
    out = []
    for _ in range(num_rows):
        out.append({k: v() for k, v in input_dict.items()})
    return pd.DataFrame(out)


num_records = 5

dict1 = {
    &#39;col1&#39;: lambda: 1,
    &#39;col2&#39;: lambda: randint(1, 2)
}

df_synthetic = generate_fake_data(num_records, dict1)
print(df_synthetic)

Prints:

   col1  col2
0     1     2
1     1     2
2     1     1
3     1     1
4     1     2

答案2

得分: 2

使用@andrej-kesely在https://stackoverflow.com/a/76408359/218663中提供的答案以及您希望允许用户输入更自由形式的有效参数集的需求，我认为我们可以利用ast模块。

import pandas
import random
import ast

test_rows = 10
template = {}
while True:
    key = input("输入列名：")
    if not key:
        break

    value = input(f"输入列“{key}”的有效值列表[1,2,3]：")
    try:
        value = ast.literal_eval(value)
    except (ValueError, SyntaxError):
        pass

    template[key] = value if isinstance(value, list) else [value]

df_synthetic = pandas.DataFrame([
    {key: random.choice(value) for key, value in template.items()}
    for _ in range(test_rows)
])
print(df_synthetic)

当提示输入时：

col1
1
col2
[1,2]
<回车>

您应该会得到：

   col1  col2
0     1     2
1     1     1
2     1     1
3     1     2
4     1     1
5     1     1
6     1     1
7     1     2
8     1     1
9     1     2

英文:

Leveraging the answer by @andrej-kesely here https://stackoverflow.com/a/76408359/218663 and your desire to allow users to enter a more free form set of valid parameters, I believe we can leverage the ast module.

import pandas
import random
import ast

test_rows = 10
template = {}
while True:
    key = input(&quot;Enter a column name: &quot;)
    if not key:
        break

    value = input(f&quot;Enter a list [1,2,3] of valid values for column \&quot;{key}\&quot;: &quot;)
    try:
        value = ast.literal_eval(value)
    except (ValueError, SyntaxError):
        pass

    template[key] = value if isinstance(value, list) else [value]

df_synthetic = pandas.DataFrame([
    {key: random.choice(value) for key, value in template.items()}
    for _ in range(test_rows)
])
print(df_synthetic)

When prompted, the inputs:

col1
1
col2
[1,2]
&lt;return&gt;

You should get back:

   col1  col2
0     1     2
1     1     1
2     1     1
3     1     2
4     1     1
5     1     1
6     1     1
7     1     2
8     1     1
9     1     2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个字典，其中值是表达式，而不进行评估或将其视为字符串。

问题

答案1

答案2

如何移除 JointGrid 的边框和边距

如何根据多个条件删除数据框的行

Event Sourcing with Python: 如何制作一个投影？

使用Python，如何在两列中填充缺失的日期和数据。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论