英文:
How to define an empty DataFrame with dynamically typed Column Names and Column Types in Julia?
问题
以下是翻译好的部分:
给定这样的列名和列类型:
col_names = ["A", "B", "C"]
col_types = ["String", "Int64", "Bool"]
我想创建一个空的 DataFrame,像这样:
desired_DF = DataFrame(A = String[], B = Int64[], C = Bool[]) #但我不能每次都像这样指定每个列名和类型。
如何做到这一点?
我寻求你的代码片段来完成这个任务,或者如果你喜欢下面我复制的解决方案,请解释给我。
我在这里看到了一个解决方案。它有效,但我不理解它,特别是第三行,尤其是开头的分号和末尾的三个点。
col_names = [:A, :B] # 需要是符号向量
col_types = [Int64, Float64]
# 通过执行下面的操作创建一个命名元组 (A=Int64[], ....)
named_tuple = (; zip(col_names, type[] for type in col_types )...)
df = DataFrame(named_tuple) # 0×2 DataFrame
另外,我希望也许有一种更加优雅的方法来完成这个任务?
英文:
Given column names and column types like these:
col_names = ["A", "B", "C"]
col_types = ["String", "Int64", "Bool"]
I want to create an empty DataFrame
like this:
desired_DF = DataFrame(A = String[], B = Int64[], C = Bool[]) #But I cannot specify every column name and type like this every time.
How do I do this?
I seek either your code snippet for doing the needful or, if you like the following solution I've copied below, please explain it to me.
I've seen a solution here. It works, but I do not understand it, especially the third line, in particular the semicolon at the beginning and the three dots at the end.
col_names = [:A, :B] # needs to be a vector Symbols
col_types = [Int64, Float64]
# Create a NamedTuple (A=Int64[], ....) by doing
named_tuple = (; zip(col_names, type[] for type in col_types )...)
df = DataFrame(named_tuple) # 0×2 DataFrame
Also, I was hoping that perhaps there is an even more elegant way to do the needful?
答案1
得分: 4
以下是翻译的内容:
让我们从输入开始:
3-element Vector{String}:
"A"
"B"
"C"
julia> col_types = [String, Int64, Bool]
3-element Vector{DataType}:
String
Int64
Bool
请注意区别,col_types
需要的是类型而不是字符串。col_names
按照您提议的方式是正确的。
现在有多种方法可以解决您的问题。让我展示我认为最简单的一种方法:
首先,创建一个向量的向量,它将成为您数据框的列:
3-element Vector{Vector}:
String[]
Int64[]
Bool[]
现在您只需将它传递给 DataFrame
构造函数,其中这个向量的向量是第一个参数,第二个参数是列名:
0×3 DataFrame
Row │ A B C
│ String Int64 Bool
─────┴─────────────────────
然后您就完成了。
如果您没有列名,可以自动生成它们,将 :auto
作为第二个参数传递:
0×3 DataFrame
Row │ x1 x2 x3
│ String Int64 Bool
─────┴─────────────────────
这是获得您想要的内容的简单方法。
现在让我们分解您上面提到的方法:
要理解它,您需要知道如何将关键字参数传递给函数。请看这个例子:
f (generic function with 1 method)
julia> f(; [(:a, 10), (:b, 20), (:c, 30)]...)
pairs(::NamedTuple) with 3 entries:
:a => 10
:b => 20
:c => 30
现在的诀窍是,在上面的示例中:
您正好使用了这个诀窍。由于您没有传递函数的名称,因此会创建一个 NamedTuple
(这是Julia语法的工作方式)。zip
部分只是为您创建值的元组,就像我的示例函数中一样:
3-element Vector{Tuple{Symbol, Vector}}:
(:A, String[])
(:B, Int64[])
(:C, Bool[])
因此,该示例与传递以下内容相同:
(A = String[], B = Int64[], C = Bool[])
这在我们已经说过的情况下等同于传递:
(A = String[], B = Int64[], C = Bool[])
这又等同于仅仅写:
(A = String[], B = Int64[], C = Bool[])
因此,这是解释为什么您引用的示例起作用的方式。然而,我认为我提出的方法更简单。
英文:
Let us start with the input:
julia> col_names = ["A", "B", "C"]
3-element Vector{String}:
"A"
"B"
"C"
julia> col_types = [String, Int64, Bool]
3-element Vector{DataType}:
String
Int64
Bool
Note the difference, col_types
need to be types not strings. col_names
are good the way you proposed.
Now there are many ways to solve your problem. Let me show you the simplest one in my opinion:
First, create a vector of vectors that will be columns of your data frame:
julia> [T[] for T in col_types]
3-element Vector{Vector}:
String[]
Int64[]
Bool[]
Now you just need to pass it to DataFrame
constructor, where this vector of vectors is a first argument, and the second argument are column names:
julia> DataFrame([T[] for T in col_types], col_names)
0×3 DataFrame
Row │ A B C
│ String Int64 Bool
─────┴─────────────────────
and you are done.
If you would not have column names you can generate them automatically passing :auto
as a second argument:
julia> DataFrame([T[] for T in col_types], :auto)
0×3 DataFrame
Row │ x1 x2 x3
│ String Int64 Bool
─────┴─────────────────────
This is a simple way to get what you want.
Now let us decompose the approach you mentioned above:
(; zip(col_names, type[] for type in col_types )...)
To understand it you need to know how keyword arguments can be passed to functions. See this:
julia> f(; kwargs...) = kwargs
f (generic function with 1 method)
julia> f(; [(:a, 10), (:b, 20), (:c, 30)]...)
pairs(::NamedTuple) with 3 entries:
:a => 10
:b => 20
:c => 30
Now the trick is that in the example above:
(; zip(col_names, type[] for type in col_types )...)
you use exactly this trick. Since you do not pass a name of the function a NamedTuple
is created (this is how Julia syntax works). The zip
part just creates you the tuples of values, like in my example function above:
julia> collect(zip(col_names, type[] for type in col_types ))
3-element Vector{Tuple{Symbol, Vector}}:
(:A, String[])
(:B, Int64[])
(:C, Bool[])
So the example is the same as passing:
julia> (; [(:A, String[]), (:B, Int64[]), (:C, Bool[])]...)
(A = String[], B = Int64[], C = Bool[])
Which is, given what we have said, the same as passing:
julia> (; :A => String[], :B => Int64[], :C => Bool[])
(A = String[], B = Int64[], C = Bool[])
Which is, in turn, the same as just writing:
julia> (; A = String[], B = Int64[], C = Bool[])
(A = String[], B = Int64[], C = Bool[])
So - this is the explanation how and why the example you quoted works. However, I believe that what I propose is simpler.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论