返回传递给函数的DataFrame实例的名称。

huangapple go评论67阅读模式
英文:

Retrieve the name of an instance of DataFrame, passed as an argument to the function

问题

我想要检索作为参数传递给我的函数的DataFrame实例的名称,以便在函数的执行中使用该名称。
在脚本中的示例:

display(df_on_step_42)

我想要检索字符串"df_on_step_42",以在display函数的执行中使用(该函数显示DataFrame的内容)。
作为最后的手段,我可以将DataFrame及其名称作为参数传递:

display(df_on_step_42, "df_on_step_42")

但我更愿意不使用这第二个参数。
PySpark的DataFrame是不可变的,因此在我们的数据流程中,我们无法将名称属性系统地添加到来自其他DataFrame的所有新DataFrame。

英文:

I am looking to retrieve the name of an instance of DataFrame, that I pass as an argument to my function, to be able to use this name in the execution of the function.
Example in a script:

display(df_on_step_42)

I would like to retrieve the string "df_on_step_42" to use in the execution of the display function (that display the content of the DataFrame).

As a last resort, I can pass as argument of DataFrame and its name:

display(df_on_step_42, "df_on_step_42")

But I would prefer to do without this second argument.

PySpark DataFrames are non-transformable, so in our data pipeline, we cannot systematically put a name attribute to all the new DataFrames that come from other DataFrames.

答案1

得分: 0

你可以使用`globals()`字典通过匹配使用`eval`来搜索你的变量。

正如@juanpa.arrivillaga所提到的,这基本上是不好的设计,但如果你需要的话,可以参考[这个旧的SO答案](https://stackoverflow.com/a/15361037/4755954)来实现这个目标,适用于python2 -

```python
import pandas as pd

df_on_step_42 = pd.DataFrame()

def get_var_name(var):
    for k in globals().keys():
        try:
            if eval(k) is var:
                return k
        except:
            pass
        
get_var_name(df_on_step_42)
'df_on_step_42'

然后你的显示将会是 -

display(df_on_step_42, get_var_name(df_on_step_42))

注意

这对于变量的视图来说是会失败的,因为它们只是指向原始变量的内存。这意味着在迭代键时,如果原始变量首先出现在全局字典中,它将返回原始变量的名称。

a = 123
b = a

get_var_name(b)
'a'

<details>
<summary>英文:</summary>

You can use the `globals()` dictionary to search for your variable by matching it using `eval`. 

As @juanpa.arrivillaga mentions, this is fundamentally bad design, but if you need to, here is one way to do this inspired by [this old SO answer](https://stackoverflow.com/a/15361037/4755954) for python2 - 

import pandas as pd

df_on_step_42 = pd.DataFrame()

def get_var_name(var):
for k in globals().keys():
try:
if eval(k) is var:
return k
except:
pass

get_var_name(df_on_step_42)

'df_on_step_42'


-----

Your display would then look like - 

display(df_on_step_42, get_var_name(df_on_step_42))


-----

### Caution

This will fail for views of variables since they are just pointing to the memory of the original variable. This means that the original variable occurs first in the global dictionary during an iteration of the keys, it will return the name of the original variable.

a = 123
b = a

get_var_name(b)

'a'


</details>



# 答案2
**得分**: 0

I finally found a solution to my problem using the inspect and re libraries.

I use the following lines which correspond to the use of the display() function

```python
import inspect
import re

def display(df):
      frame = inspect.getouterframes(inspect.currentframe())[1]
      name = re.match(r"\s*(\S*).display", frame.code_context[0])[1]
      print(name)

display(df_on_step_42)

The inspect library allows me to get the call context of the function, in this context, the code_context attribute gives me the text of the line where the function is called, and finally the regex library allows me to isolate the name of the dataframe given as parameter.

It’s not optimal but it works.

英文:

I finally found a solution to my problem using the inspect and re libraries.

I use the following lines which correspond to the use of the display() function

import inspect
import again

def display(df):
      frame = inspect.getouterframes(inspect.currentframe())[1]
      name = re.match(&quot;\s*(\S*).display&quot;, frame.code_context[0])[1]
      print(name)

display(df_on_step_42)

The inspect library allows me to get the call context of the function, in this context, the code_context attribute gives me the text of the line where the function is called, and finally the regex library allows me to isolate the name of the dataframe given as parameter.

It’s not optimal but it works.

huangapple
  • 本文由 发表于 2023年2月16日 04:06:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464951.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定