用Julia如何定义一个带有动态类型列名和列类型的空DataFrame?

huangapple go评论58阅读模式
英文:

How to define an empty DataFrame with dynamically typed Column Names and Column Types in Julia?

问题

以下是翻译好的部分:

给定这样的列名和列类型

col_names = ["A", "B", "C"]
col_types = ["String", "Int64", "Bool"]

我想创建一个空的 DataFrame像这样

desired_DF = DataFrame(A = String[], B = Int64[], C = Bool[]) #但我不能每次都像这样指定每个列名和类型。

如何做到这一点

我寻求你的代码片段来完成这个任务或者如果你喜欢下面我复制的解决方案请解释给我

我在这里看到了一个解决方案它有效但我不理解它特别是第三行尤其是开头的分号和末尾的三个点

col_names = [:A, :B] # 需要是符号向量
col_types = [Int64, Float64]
# 通过执行下面的操作创建一个命名元组 (A=Int64[], ....)
named_tuple = (; zip(col_names, type[] for type in col_types )...)

df = DataFrame(named_tuple) # 0×2 DataFrame

另外,我希望也许有一种更加优雅的方法来完成这个任务?

英文:

Given column names and column types like these:

col_names = ["A", "B", "C"]
col_types = ["String", "Int64", "Bool"]

I want to create an empty DataFrame like this:

desired_DF = DataFrame(A = String[], B = Int64[], C = Bool[]) #But I cannot specify every column name and type like this every time.

How do I do this?

I seek either your code snippet for doing the needful or, if you like the following solution I've copied below, please explain it to me.

I've seen a solution here. It works, but I do not understand it, especially the third line, in particular the semicolon at the beginning and the three dots at the end.

col_names = [:A, :B] # needs to be a vector Symbols
col_types = [Int64, Float64]
# Create a NamedTuple (A=Int64[], ....) by doing
named_tuple = (; zip(col_names, type[] for type in col_types )...)

df = DataFrame(named_tuple) # 0×2 DataFrame

Also, I was hoping that perhaps there is an even more elegant way to do the needful?

答案1

得分: 4

以下是翻译的内容:

让我们从输入开始:

3-element Vector{String}:
 "A"
 "B"
 "C"

julia> col_types = [String, Int64, Bool]
3-element Vector{DataType}:
 String
 Int64
 Bool

请注意区别,col_types 需要的是类型而不是字符串。col_names 按照您提议的方式是正确的。

现在有多种方法可以解决您的问题。让我展示我认为最简单的一种方法:

首先,创建一个向量的向量,它将成为您数据框的列:

3-element Vector{Vector}:
 String[]
 Int64[]
 Bool[]

现在您只需将它传递给 DataFrame 构造函数,其中这个向量的向量是第一个参数,第二个参数是列名:

0×3 DataFrame
 Row │ A       B      C
     │ String  Int64  Bool
─────┴─────────────────────

然后您就完成了。

如果您没有列名,可以自动生成它们,将 :auto 作为第二个参数传递:

0×3 DataFrame
 Row │ x1      x2     x3
     │ String  Int64  Bool
─────┴─────────────────────

这是获得您想要的内容的简单方法。


现在让我们分解您上面提到的方法:

要理解它,您需要知道如何将关键字参数传递给函数。请看这个例子:

f (generic function with 1 method)

julia> f(; [(:a, 10), (:b, 20), (:c, 30)]...)
pairs(::NamedTuple) with 3 entries:
  :a => 10
  :b => 20
  :c => 30

现在的诀窍是,在上面的示例中:

您正好使用了这个诀窍。由于您没有传递函数的名称,因此会创建一个 NamedTuple(这是Julia语法的工作方式)。zip 部分只是为您创建值的元组,就像我的示例函数中一样:

3-element Vector{Tuple{Symbol, Vector}}:
 (:A, String[])
 (:B, Int64[])
 (:C, Bool[])

因此,该示例与传递以下内容相同:

(A = String[], B = Int64[], C = Bool[])

这在我们已经说过的情况下等同于传递:

(A = String[], B = Int64[], C = Bool[])

这又等同于仅仅写:

(A = String[], B = Int64[], C = Bool[])

因此,这是解释为什么您引用的示例起作用的方式。然而,我认为我提出的方法更简单。

英文:

Let us start with the input:

julia> col_names = ["A", "B", "C"]
3-element Vector{String}:
 "A"
 "B"
 "C"

julia> col_types = [String, Int64, Bool]
3-element Vector{DataType}:
 String
 Int64
 Bool

Note the difference, col_types need to be types not strings. col_names are good the way you proposed.

Now there are many ways to solve your problem. Let me show you the simplest one in my opinion:

First, create a vector of vectors that will be columns of your data frame:

julia> [T[] for T in col_types]
3-element Vector{Vector}:
 String[]
 Int64[]
 Bool[]

Now you just need to pass it to DataFrame constructor, where this vector of vectors is a first argument, and the second argument are column names:

julia> DataFrame([T[] for T in col_types], col_names)
0×3 DataFrame
 Row │ A       B      C
     │ String  Int64  Bool
─────┴─────────────────────

and you are done.

If you would not have column names you can generate them automatically passing :auto as a second argument:

julia> DataFrame([T[] for T in col_types], :auto)
0×3 DataFrame
 Row │ x1      x2     x3
     │ String  Int64  Bool
─────┴─────────────────────

This is a simple way to get what you want.


Now let us decompose the approach you mentioned above:

(; zip(col_names, type[] for type in col_types )...)

To understand it you need to know how keyword arguments can be passed to functions. See this:

julia> f(; kwargs...) = kwargs
f (generic function with 1 method)

julia> f(; [(:a, 10), (:b, 20), (:c, 30)]...)
pairs(::NamedTuple) with 3 entries:
  :a => 10
  :b => 20
  :c => 30

Now the trick is that in the example above:

(; zip(col_names, type[] for type in col_types )...)

you use exactly this trick. Since you do not pass a name of the function a NamedTuple is created (this is how Julia syntax works). The zip part just creates you the tuples of values, like in my example function above:

julia> collect(zip(col_names, type[] for type in col_types ))
3-element Vector{Tuple{Symbol, Vector}}:
 (:A, String[])
 (:B, Int64[])
 (:C, Bool[])

So the example is the same as passing:

julia> (; [(:A, String[]), (:B, Int64[]), (:C, Bool[])]...)
(A = String[], B = Int64[], C = Bool[])

Which is, given what we have said, the same as passing:

julia> (; :A => String[], :B => Int64[], :C => Bool[])
(A = String[], B = Int64[], C = Bool[])

Which is, in turn, the same as just writing:

julia> (; A = String[], B = Int64[], C = Bool[])
(A = String[], B = Int64[], C = Bool[])

So - this is the explanation how and why the example you quoted works. However, I believe that what I propose is simpler.

huangapple
  • 本文由 发表于 2023年7月20日 10:51:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定