英文:
Strip out currency and assign value
问题
Hi I have the following dataset and I wanted to create a column that would indicate the currency of the numbers:
df_currency= pd.DataFrame(columns=["Amount", "Currency_Name", "FX"])
df_currency["Amount"] = mcapnum_test
#I did the following for loop but neither the "Currency Name" nor the "FX" columns are updated as I want it:
USD = "$"
for wordcheck in mcapstr:
if USD in wordcheck:
df_currency = df_currency.assign(FX=lambda x: 1)
df_currency = df_currency.assign(Currency_Name=lambda x: "USD")
else:
df_currency = df_currency.assign(FX=lambda x: "TBD")
df_currency = df_currency.assign(Currency_Name=lambda x: "Other")
#all the "Currency Name" nor the "FX" columns ends up being "USD" and "1"
#but when I do a simple print out test, it seems that the for loop is working
USD = "$"
for wordcheck in mcapstr:
if USD in wordcheck:
print("USD")
else:
print("Other")
英文:
Hi I have the following dataset and I wanted to create a column that would indicate the currency of the numbers:
df_currency= pd.DataFrame(columns=["Amount", "Currency_Name", "FX"])
df_currency["Amount"] = mcapnum_test
Amount Currency_Name FX
0 $3692391833 NaN NaN
1 $17868370525 NaN NaN
2 $51376239909 NaN NaN
3 $139591325133 NaN NaN
4 $54863164472 NaN NaN
.. ... ... ...
491 14139547170 MYR NaN NaN
492 $2293285351 NaN NaN
493 $10892645287 NaN NaN
494 278539272091 CNY NaN NaN
495 $38316261938 NaN NaN
#I did the following for loop but neither the "Currency Name" nor the "FX" columns are updated as I want it:
USD = "$"
for wordcheck in mcapstr:
if USD in wordcheck:
df_currency = df_currency.assign(FX=lambda x: 1)
df_currency = df_currency.assign(Currency_Name=lambda x: "USD")
else:
df_currency = df_currency.assign(FX=lambda x: "TBD")
df_currency = df_currency.assign(Currency_Name=lambda x: "Other")
#all the "Currency Name" nor the "FX" columns ends up being "USD" and "1"
#but when I do a simple print out test, it seems that the for loop is working
USD = "$"
for wordcheck in mcapstr:
if USD in wordcheck:
print("USD")
else:
print("Other")
答案1
得分: 1
你应该避免遍历DataFrame。在这种情况下,你可以计算一个布尔掩码,以了解所有行中是否存在“USD in amount”,这是更好的做法,应该更快。
总体而言,它应该看起来像这样:
# 创建具有默认值的新列
df_currency["Currency_Name"] = "TBD"
df_currency["FX"] = "Other"
is_usd_mask = df_currency["Amount"].str.contains("$", regex=False)
df_currency.loc[is_usd_mask, "Currency_Name"] = "USD"
df_currency.loc[is_usd_mask, "FX"] = 1
英文:
You should avoid iterating through a DataFrame. In this case, you can compute a boolean mask to know if USD in amount
over all rows, which is better practice, and should be much quicker.
Overall, it should look something like this:
# create the new columns with default values
df_currency["Currency_Name"] = "TBD"
df_currency["FX"] = "Other"
is_usd_mask = df_currency["Amount"].str.contains("$", regex=False)
df_currency.loc[is_usd_mask, "Currency_Name"] = "USD"
df_currency.loc[is_usd_mask, "FX"] = 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论