是否可以向Pandas数据框添加列而不填充任何值?

huangapple go评论59阅读模式
英文:

Is it possible to add columns to a pandas dataframe without filling it with any values?

问题

不要有别的内容,只返回翻译好的部分:

"因此,我有一个 pandas 数据框,它从一个函数传递到另一个函数。然而,目前我没有任何数据来填充行。

此外,由于代码的结构方式,数据框需要具有某些列。

是否可以在数据框中添加列而不将其映射到任何值?我也不想将它映射到 0None 或任何默认值。我只想要具有特定列的空数据框。

例如:

...

def _trades(self, trades_df):

    trades_df = trades_df.rename(columns={'timestamp': 'trade_timestamp'})
    trades_df['publication_timestamp'] = trades_df['trade_timestamp']
    trades_df['trade_id'] = trades_df['trade_id'].astype(str)

    # 设置可打印列 - 这样是空数据框安全的
    trades_df['printable'] = True

    # 没有显式映射交易类型
    trades_df['trade_type'] = None

    trades_df['implied'] = 0

    return trades_df

如您所见,上面的 implied 列映射为 0,trade_type 也映射为 None。

然而,我只想添加列,而不将其与任何默认值映射。"

英文:

So I have a pandas dataframe which is being passed from function to function. However a the moment I do not have any data to populate the rows with.

Furthermore, because of the way the code is structured, the dataframe needs to have certain columns.

Is it possible to add columns to a dataframe without mapping it to any value? I also don't want to map it to 0 or None or any default value. I would just like the empty dataframe with certain columns.

e.g.

...

def _trades(self, trades_df):

    trades_df = trades_df.rename(columns={'timestamp': 'trade_timestamp'})
    trades_df['publication_timestamp'] = trades_df['trade_timestamp']
    trades_df['trade_id'] = trades_df['trade_id'].astype(str)


    # set printable column - this way is empty dataframe safe
    trades_df['printable'] = True

    # No trade_types to map explicitly
    trades_df['trade_type'] = None

    trades_df['implied'] = 0

    return trades_df

As you can see above the implied column is mapped to 0 and trade_type is also mapped to None.

However I just want to add the columns without mapping it with any default value.

答案1

得分: 1

在pandas中,数据框对象是表格化的。这意味着它包含了一个矩形的值集合。这个矩形可以没有行,这种情况下列可以被添加而不含任何值。

然而,如果矩形有非零数量的行,那么每个列中的行必须有一个值。这个值可以是None(Python的空对象值),或NaN(NumPy的非数字值),或空字符串,甚至是一个空的Python序列(元组或列表)。但在具有非零长度的两个轴(行和列)的数据框中,没有没有任何值的单元格这种情况。

你能做的另一件事是使用numpy.empty()来初始化新列的数据,根据文档它会:

返回给定形状和类型的新数组,而不初始化条目。

考虑这段代码:

trades_df['trade_type'] = np.empty([len(trades_df)])
trades_df['implied'] = np.empty([len(trades_df)])

输入:

   trade_timestamp  publication_timestamp trade_id  printable
0                1                      1      101       True
1                2                      2      102       True
2                3                      3      103       True

输出:

   trade_timestamp  publication_timestamp trade_id  printable     trade_type
0                1                      1      101       True  6.953347e-310
1                2                      2      102       True  6.953347e-310
2                3                      3      103       True  6.953347e-310
   trade_timestamp  publication_timestamp trade_id  printable     trade_type        implied
0                1                      1      101       True  6.953347e-310  1.232637e-311
1                2                      2      102       True  6.953347e-310  1.232637e-311
2                3                      3      103       True  6.953347e-310  1.232637e-311

上面的例子使用了numpy.empty()默认的dtype参数为float,但也可以使用其他NumPy标量类型

英文:

In pandas, the dataframe object is tabular. This means it contains a rectangular collection of values. This rectangle can have zero rows, in which case columns can be added without any values in those columns.

However, if the rectangle has a non-zero number of rows, then each row in a column must have a value. This value can be None (python's null object value) or NaN (numpy's not-a-number value) or the empty string, or even an empty python sequence (tuple or list). But there is no such thing, in a dataframe with both axes (rows and columns) having non-zero length, as a cell without any value.

The one other thing you can do is to initialize the data in a new column using numpy.empty() which according to the docs will:

> Return a new array of given shape and type, without initializing entries.

Consider this code:

trades_df['trade_type'] = np.empty([len(trades_df)])
trades_df['implied'] = np.empty([len(trades_df)])

Input:

   trade_timestamp  publication_timestamp trade_id  printable
0                1                      1      101       True
1                2                      2      102       True
2                3                      3      103       True

Output:

   trade_timestamp  publication_timestamp trade_id  printable     trade_type
0                1                      1      101       True  6.953347e-310
1                2                      2      102       True  6.953347e-310
2                3                      3      103       True  6.953347e-310
   trade_timestamp  publication_timestamp trade_id  printable     trade_type        implied
0                1                      1      101       True  6.953347e-310  1.232637e-311
1                2                      2      102       True  6.953347e-310  1.232637e-311
2                3                      3      103       True  6.953347e-310  1.232637e-311

Th above example passes the default dtype argument float to numpy.empty(), but it is possible to use other numpy scalar types instead.

答案2

得分: 0

是的,当然:

df["C"] = ""
英文:

yes, of course:

df["C"] = ""

huangapple
  • 本文由 发表于 2023年2月19日 18:58:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75499626.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定