去除货币并分配值

huangapple go评论64阅读模式
英文:

Strip out currency and assign value

问题

Hi I have the following dataset and I wanted to create a column that would indicate the currency of the numbers:
df_currency= pd.DataFrame(columns=["Amount", "Currency_Name", "FX"])
df_currency["Amount"] = mcapnum_test

#I did the following for loop but neither the "Currency Name" nor the "FX" columns are updated as I want it:
USD = "$"
for wordcheck in mcapstr:
    if USD in wordcheck:
        df_currency = df_currency.assign(FX=lambda x: 1) 
        df_currency = df_currency.assign(Currency_Name=lambda x: "USD")
    else:
        df_currency = df_currency.assign(FX=lambda x: "TBD")
        df_currency = df_currency.assign(Currency_Name=lambda x: "Other")

#all the "Currency Name" nor the "FX" columns ends up being "USD" and "1"

#but when I do a simple print out test, it seems that the for loop is working
USD = "$"
for wordcheck in mcapstr:
    if USD in wordcheck:
        print("USD")
    else:
        print("Other")
英文:

Hi I have the following dataset and I wanted to create a column that would indicate the currency of the numbers:

df_currency= pd.DataFrame(columns=["Amount", "Currency_Name", "FX"])
df_currency["Amount"] = mcapnum_test

               Amount Currency_Name   FX
0         $3692391833           NaN  NaN
1        $17868370525           NaN  NaN
2        $51376239909           NaN  NaN
3       $139591325133           NaN  NaN
4        $54863164472           NaN  NaN
..                ...           ...  ...
491   14139547170 MYR           NaN  NaN
492       $2293285351           NaN  NaN
493      $10892645287           NaN  NaN
494  278539272091 CNY           NaN  NaN
495      $38316261938           NaN  NaN

#I did the following for loop but neither the "Currency Name" nor the "FX" columns are updated as I want it:

USD = "$"
for wordcheck in mcapstr:
    if USD in wordcheck:
        df_currency = df_currency.assign(FX=lambda x: 1) 
        df_currency = df_currency.assign(Currency_Name=lambda x: "USD")
    else:
        df_currency = df_currency.assign(FX=lambda x: "TBD")
        df_currency = df_currency.assign(Currency_Name=lambda x: "Other")

#all the "Currency Name" nor the "FX" columns ends up being "USD" and "1"

#but when I do a simple print out test, it seems that the for loop is working

USD = "$"
for wordcheck in mcapstr:
    if USD in wordcheck:
        print("USD")
    else:
        print("Other")

答案1

得分: 1

你应该避免遍历DataFrame。在这种情况下,你可以计算一个布尔掩码,以了解所有行中是否存在“USD in amount”,这是更好的做法,应该更快。

总体而言,它应该看起来像这样:

# 创建具有默认值的新列
df_currency["Currency_Name"] = "TBD"
df_currency["FX"] = "Other"

is_usd_mask = df_currency["Amount"].str.contains("$", regex=False)

df_currency.loc[is_usd_mask, "Currency_Name"] = "USD"
df_currency.loc[is_usd_mask, "FX"] = 1
英文:

You should avoid iterating through a DataFrame. In this case, you can compute a boolean mask to know if USD in amount over all rows, which is better practice, and should be much quicker.

Overall, it should look something like this:

# create the new columns with default values
df_currency["Currency_Name"] = "TBD"
df_currency["FX"] = "Other"

is_usd_mask = df_currency["Amount"].str.contains("$", regex=False)

df_currency.loc[is_usd_mask, "Currency_Name"] = "USD"
df_currency.loc[is_usd_mask, "FX"] = 1

huangapple
  • 本文由 发表于 2023年3月1日 15:42:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75600771.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定