英文:
Is it possible to add columns to a pandas dataframe without filling it with any values?
问题
不要有别的内容,只返回翻译好的部分:
"因此,我有一个 pandas 数据框,它从一个函数传递到另一个函数。然而,目前我没有任何数据来填充行。
此外,由于代码的结构方式,数据框需要具有某些列。
是否可以在数据框中添加列而不将其映射到任何值?我也不想将它映射到 0
或 None
或任何默认值。我只想要具有特定列的空数据框。
例如:
...
def _trades(self, trades_df):
trades_df = trades_df.rename(columns={'timestamp': 'trade_timestamp'})
trades_df['publication_timestamp'] = trades_df['trade_timestamp']
trades_df['trade_id'] = trades_df['trade_id'].astype(str)
# 设置可打印列 - 这样是空数据框安全的
trades_df['printable'] = True
# 没有显式映射交易类型
trades_df['trade_type'] = None
trades_df['implied'] = 0
return trades_df
如您所见,上面的 implied
列映射为 0,trade_type
也映射为 None。
然而,我只想添加列,而不将其与任何默认值映射。"
英文:
So I have a pandas dataframe which is being passed from function to function. However a the moment I do not have any data to populate the rows with.
Furthermore, because of the way the code is structured, the dataframe needs to have certain columns.
Is it possible to add columns to a dataframe without mapping it to any value? I also don't want to map it to 0
or None
or any default value. I would just like the empty dataframe with certain columns.
e.g.
...
def _trades(self, trades_df):
trades_df = trades_df.rename(columns={'timestamp': 'trade_timestamp'})
trades_df['publication_timestamp'] = trades_df['trade_timestamp']
trades_df['trade_id'] = trades_df['trade_id'].astype(str)
# set printable column - this way is empty dataframe safe
trades_df['printable'] = True
# No trade_types to map explicitly
trades_df['trade_type'] = None
trades_df['implied'] = 0
return trades_df
As you can see above the implied
column is mapped to 0 and trade_type
is also mapped to None.
However I just want to add the columns without mapping it with any default value.
答案1
得分: 1
在pandas中,数据框对象是表格化的。这意味着它包含了一个矩形的值集合。这个矩形可以没有行,这种情况下列可以被添加而不含任何值。
然而,如果矩形有非零数量的行,那么每个列中的行必须有一个值。这个值可以是None(Python的空对象值),或NaN(NumPy的非数字值),或空字符串,甚至是一个空的Python序列(元组或列表)。但在具有非零长度的两个轴(行和列)的数据框中,没有没有任何值的单元格这种情况。
你能做的另一件事是使用numpy.empty()
来初始化新列的数据,根据文档它会:
返回给定形状和类型的新数组,而不初始化条目。
考虑这段代码:
trades_df['trade_type'] = np.empty([len(trades_df)])
trades_df['implied'] = np.empty([len(trades_df)])
输入:
trade_timestamp publication_timestamp trade_id printable
0 1 1 101 True
1 2 2 102 True
2 3 3 103 True
输出:
trade_timestamp publication_timestamp trade_id printable trade_type
0 1 1 101 True 6.953347e-310
1 2 2 102 True 6.953347e-310
2 3 3 103 True 6.953347e-310
trade_timestamp publication_timestamp trade_id printable trade_type implied
0 1 1 101 True 6.953347e-310 1.232637e-311
1 2 2 102 True 6.953347e-310 1.232637e-311
2 3 3 103 True 6.953347e-310 1.232637e-311
上面的例子使用了numpy.empty()
默认的dtype
参数为float,但也可以使用其他NumPy标量类型。
英文:
In pandas, the dataframe object is tabular. This means it contains a rectangular collection of values. This rectangle can have zero rows, in which case columns can be added without any values in those columns.
However, if the rectangle has a non-zero number of rows, then each row in a column must have a value. This value can be None (python's null object value) or NaN (numpy's not-a-number value) or the empty string, or even an empty python sequence (tuple or list). But there is no such thing, in a dataframe with both axes (rows and columns) having non-zero length, as a cell without any value.
The one other thing you can do is to initialize the data in a new column using numpy.empty()
which according to the docs will:
> Return a new array of given shape and type, without initializing entries.
Consider this code:
trades_df['trade_type'] = np.empty([len(trades_df)])
trades_df['implied'] = np.empty([len(trades_df)])
Input:
trade_timestamp publication_timestamp trade_id printable
0 1 1 101 True
1 2 2 102 True
2 3 3 103 True
Output:
trade_timestamp publication_timestamp trade_id printable trade_type
0 1 1 101 True 6.953347e-310
1 2 2 102 True 6.953347e-310
2 3 3 103 True 6.953347e-310
trade_timestamp publication_timestamp trade_id printable trade_type implied
0 1 1 101 True 6.953347e-310 1.232637e-311
1 2 2 102 True 6.953347e-310 1.232637e-311
2 3 3 103 True 6.953347e-310 1.232637e-311
Th above example passes the default dtype
argument float to numpy.empty()
, but it is possible to use other numpy scalar types instead.
答案2
得分: 0
是的,当然:
df["C"] = ""
英文:
yes, of course:
df["C"] = ""
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论