英文:
How to create a third column based on first two column in pandas dataframe?
问题
我有一个如下的数据框
col-a | col-b |
---|---|
abc | 123 |
def | 456 |
ghi | 789 |
我有一个字符串 str = f"https://{val1}.{val2}",我想使用它来创建如下的col-c
col-a | col-b | col-c |
---|---|---|
abc | 123 | https://abc.123 |
def | 456 | https://def.456 |
ghi | 789 | https://ghi.789 |
数据框很大,我想使用np.where/np.select,因为我认为 .apply() 函数会很慢。即使使用 .apply(),我也无法将两列的值放入列 C。是否有人可以提供帮助?
英文:
I have a data frame like below
col-a | col-b |
---|---|
abc | 123 |
def | 456 |
ghi | 789 |
I have a string str = f"https://{val1}.{val2}" and using it I want to create col-c like below
col-a | col-b | col-c |
---|---|---|
abc | 123 | https://abc.123 |
def | 456 | https://def.456 |
ghi | 789 | https://ghi.789 |
The dataframe is big and I want to use np.where/np.select because I think .apply() function will be slow. Even with apply() I am unable to put two column values into column C .Could anyone help here?
答案1
得分: 1
你可以定义一个方法,该方法接收你想要连接的两个值并返回结果:
import pandas as pd
# 创建示例数据帧
df = pd.DataFrame({'column_1': ['abc', 'def', 'ghi'], 'column_2': [123, 456, 789]})
# 定义用下划线连接两个值的函数
def concatenate_with_underscore(val1, val2):
return f"https://{val1}.{val2}"
# 将函数应用于数据帧的每一行
df['new_column'] = df.apply(lambda row: concatenate_with_underscore(row['column_1'], row['column_2']), axis=1)
print(df)
这将生成所需的数据帧:
column_1 column_2 new_column
0 abc 123 https://abc.123
1 def 456 https://def.456
2 ghi 789 https://ghi.789
英文:
You can define a method that receives the the two values you want to concatenate and returns the result:
import pandas as pd
# create example dataframe
df = pd.DataFrame({'column_1': ['abc', 'def', 'ghi'], 'column_2': [123, 456, 789]})
# define function to concatenate two values with an underscore
def concatenate_with_underscore(val1, val2):
return f"https://{val1}.{val2}"
# apply function to each row of the dataframe
df['new_column'] = df.apply(lambda row: concatenate_with_underscore(row['column_1'], row['column_2']), axis=1)
print(df)
This produces the desired df:
column_1 column_2 new_column
0 abc 123 https://abc.123
1 def 456 https://def.456
2 ghi 789 https://ghi.789
答案2
得分: 1
Using you dataframe as an input
df = pd.DataFrame(
{
'col-a' : ['abc', 'def', 'ghi'],
'col-b' : [123, 456, 789]
}
)
I tried timing a code using apply and another one using string concatenation:
%timeit df['col-c'] = df.apply(lambda row : f"https://{row['col-a']}.{row['col-b']}", axis = 1)
499 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df['col-c'] = "https://" + df['col-a'] + "." + df['col-b'].astype(str)
347 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Edit: added a few thousand rows
df = pd.concat([df] * 10000, axis = 0)
Now .apply()
takes
%timeit df['col-c'] = df.apply(lambda row : f"https://{row['col-a']}.{row['col-b']}", axis = 1)
195 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
And string concatenation takes
%timeit df['col-c'] = "https://" + df['col-a'] + "." + df['col-b'].astype(str)
11.3 ms ± 91.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Seems to me like string concatenation is the faster approach.
英文:
Using you dataframe as an input
df = pd.DataFrame(
{
'col-a' : ['abc', 'def', 'ghi'],
'col-b' : [123, 456, 789]
}
)
I tried timing a code using apply and another one using string concatenation:
%timeit df['col-c'] = df.apply(lambda row : f"https://{row['col-a']}.{row['col-b']}", axis = 1)
499 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df['col-c'] = "https://" + df['col-a'] + "." + df['col-b'].astype(str)`
347 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Edit: added a few thousand rows
df = pd.concat([df] * 10000, axis = 0)
now .apply() takes
%timeit df['col-c'] = df.apply(lambda row : f"https://{row['col-a']}.{row['col-b']}", axis = 1)
195 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
and string concatenation takes
%timeit df['col-c'] = "https://" + df['col-a'] + "." + df['col-b'].astype(str)
11.3 ms ± 91.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Seems to me like string concatenation is the faster approach.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论