查找数据框中的字典列?

huangapple go评论59阅读模式
英文:

lookup a dictionary within a dataframe column?

问题

以下是代码的翻译部分:

# 我有一个如下所示的数据框:
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'domains': ['life', 'life', 'vkontakete'],
    'name': ['xyz', 'hyu', 'jik']
})

# 以及一个字典:
domain_dict = {
    "tassagency": "TACC",
    "life": "LIFE.ru",
    "ria": "RIA | Taza",
    "nws_ru": "NEWS.ru | Новости",
    "mash": "Mash | Мэш"
}

# 我想通过匹配df1中domains列中的字典键来从domain_dict中获取值,并将其放入df1中的新列,如果df1中的domains与字典中的任何键不匹配,那么我们将保留df1中的值,如第3条记录中所示。

# 我尝试了以下方法:

df1["vk_dom"] = np.nan

for i in df1.index:
    for i, (k, v) in enumerate(domain_dict.items()):
        if k == (df1['domains'][i]):
            df1["vk_dom"][i] = v
        else:
            df1["vk_dom"][i] = df1['domains'][i]

期望输出如下所示:

# |id|   domains| name|     vk_dom|
# | 1|      life|  xyz|    LIFE.ru|
# | 2|      life|  hyu|    LIFE.ru|
# | 3|vkontakete|  jik| vkontakete|
英文:

I have a dataframe like below :

df1

|id|   domains| name|
| 1|      life|  xyz|
| 2|      life|  hyu|
| 3|vkontakete|  jik|

And a dictionary :

domain_dict = { "tassagency": "TACC",
                "life": "LIFE.ru",
                "ria": "RIA | Taza", 
                "nws_ru" : "NEWS.ru | Новости",
                "mash" :"Mash | Мэш"}

I want to get the values from domain_dict by matching the keys from dictionary in the domains column of df1 and put it in new column in df1, if the domains in df1 doesn't match any key in dictionary, we will keep the same in df1 as seen in expected output in 3rd record.

I tried below approach :

df1["vk_dom"] = np.nan

for i in df1.index:
    for i,(k, v) in enumerate(domain_dict.items()):
        if k == (df1['domains'][i]):
            df1["vk_dom"][i] = v
        else:
            df1["vk_dom"][i] = df1['domains'][i]

how can I get below expected output :

|id|   domains| name|     vk_dom|
| 1|      life|  xyz|    LIFE.ru|
| 2|      life|  hyu|    LIFE.ru|
| 3|vkontakete|  jik| vkontakete|

答案1

得分: 1

Sure, here's the translated code part:

使用 df.map() 来将"domains"列映射到"domain_dict"中的值。<br>

df1[&quot;vk_dom&quot;] = df1[&quot;domains&quot;].map(domain_dict).fillna(df1[&quot;domains&quot;])
print(df1)

   id     domains name      vk_dom
0   1        life  xyz     LIFE.ru
1   2        life  hyu     LIFE.ru
2   3  vkontakete  jik  vkontakete
英文:

Use df.map() to map the domains column to the values in the domain_dict.<br>

df1[&quot;vk_dom&quot;] = df1[&quot;domains&quot;].map(domain_dict).fillna(df1[&quot;domains&quot;])
print(df1)

   id     domains name      vk_dom
0   1        life  xyz     LIFE.ru
1   2        life  hyu     LIFE.ru
2   3  vkontakete  jik  vkontakete

答案2

得分: 0

Sure, here's the translated code:

import pandas as pd

domain_dict = {
    "tassagency": "TACC",
    "life": "LIFE.ru",
    "ria": "RIA | Taza",
    "nws_ru": "NEWS.ru | Новости",
    "mash": "Mash | Мэш"
}

df2 = pd.Series(domain_dict).reset_index()
df2.columns = ['domains', 'vk_dom']

df1 = pd.merge(df1, df2, on=['domains'], how='left')

df1.vk_dom = df1.vk_dom.combine_first(df1.domains)
英文:
import pandas as pd

domain_dict = {
    &quot;tassagency&quot;: &quot;TACC&quot;,
    &quot;life&quot;: &quot;LIFE.ru&quot;,
    &quot;ria&quot;: &quot;RIA | Taza&quot;,
    &quot;nws_ru&quot; : &quot;NEWS.ru | Новости&quot;,
    &quot;mash&quot; :&quot;Mash | Мэш&quot;
}

df2 = pd.Series(domain_dict).reset_index()
df2.columns = [&#39;domains&#39;, &#39;vk_dom&#39;]


df1 = pd.merge(df1, df2, on = [&#39;domains&#39;], how = &#39;left&#39;)

df1.vk_dom = df1.vk_dom.combine_first(df1.domains)

答案3

得分: 0

以下是翻译好的部分:

你可以从domain_dict创建一个DataFrame,然后使用left join将其与你的df1合并,最后用域列的值填充NaN:

domain_dict = {
    "tassagency": "TACC",
    "life": "LIFE.ru",
    "ria": "RIA | Taza",
    "nws_ru": "NEWS.ru | Новости",
    "mash": "Mash | Мэш"
}

domain_df = pd.DataFrame(domain_dict.items(), columns=['domains', 'vk_dom'])
result = df1.merge(domain_df, how='left', on='domains')
result.loc[result['vk_dom'].isnull(), 'vk_dom'] = result['domains']

这将给你一个名为result的DataFrame:

   id     domains  name          vk_dom
0   1       life   xyz           LIFE.ru
1   2       life   hyu           LIFE.ru
2   3  vkontakete   jik         vkontakete

希望这对你有帮助!

英文:

You can create a DataFrame from the domain_dict -> merge it with your df1 using left join -> fill NaNs with the values from domain column:

domain_dict = {
            &quot;tassagency&quot;: &quot;TACC&quot;,
            &quot;life&quot;: &quot;LIFE.ru&quot;,
            &quot;ria&quot;: &quot;RIA | Taza&quot;,
            &quot;nws_ru&quot; : &quot;NEWS.ru | Новости&quot;,
            &quot;mash&quot; :&quot;Mash | Мэш&quot;
        }
        
domain_df = pd.DataFrame(domain_dict.items(), columns=[&#39;domains&#39;, &#39;vk_dom&#39;])
result = df1.merge(domain_df, how=&#39;left&#39;, on=&#39;domains&#39;)
result.loc[result[&#39;vk_dom&#39;].isnull(), &#39;vk_dom&#39;] = result[&#39;domains&#39;]

This will give you the following DataFrame named result:

	id	domains	name	vk_dom
0	1	life	xyz	LIFE.ru
1	2	life	hyu	LIFE.ru
2	3	vkontakete	jik	vkontakete

huangapple
  • 本文由 发表于 2023年4月10日 21:25:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75977530.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定