基于条件在列中保留数值。

huangapple go评论64阅读模式
英文:

keep value in a column based on condition

问题

import pandas as pd
data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
'src_country':['china','US','china'],
'dst_country':['pakistan','china','india']
}

Data=pd.DataFrame(data)

# Keep only the values in ip.src where src_country is 'china'
Data['ip.src'] = Data['ip.src'][Data['src_country'] == 'china']

# Keep only the values in ip.dst where dst_country is 'china'
Data['ip.dst'] = Data['ip.dst'][Data['dst_country'] == 'china']

# Drop rows where both ip.src and ip.dst are NaN
Data = Data.dropna(subset=['ip.src', 'ip.dst'])

# Reset the index
Data = Data.reset_index(drop=True)

Data

这段代码将保留仅在src_country为'china'时的ip.src列中的值,以及仅在dst_country为'china'时的ip.dst列中的值。然后,删除同时为NaN的ip.srcip.dst的行,并重新设置索引。

英文:

I have a dataframe

import pandas as pd
data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
'src_country':['china','US','china'],
'dst_country':['pakistan','china','india']
}

Data=pd.DataFrame(data)

I want to keep only that value in ip.src and ip.dst columns which has china ,like if china is in src_country then it should only keep the value in ip.src and if china is in dst_country then it should only keep the value in ip.dst.Is there any way to do it?

答案1

得分: 1

import numpy as np

Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]

Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)

Data
    ip.src   ip.dst src_country dst_country
0  x.x.x.x  a.a.a.a       中国         NaN
1  y.y.y.y  b.b.b.b         NaN       中国
2  z.z.z.z  c.c.c.c       中国         NaN
英文:

Something like this?

import numpy as np

Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]

Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)

Data
    ip.src   ip.dst src_country dst_country
0  x.x.x.x  a.a.a.a       china         NaN
1  y.y.y.y  b.b.b.b         NaN       china
2  z.z.z.z  c.c.c.c       china         NaN

答案2

得分: 1

使用 [`DataFrame.loc`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) 来修改 `ip.src/ip.dst`

    Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
    Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']
    
    print (Data)
        ip.src   ip.dst src_country dst_country
    0  x.x.x.x      NaN       china    pakistan
    1      NaN  b.b.b.b          US       china
    2  z.z.z.z      NaN       china       india

或者

    m = Data[['src_country','dst_country']] == 'china'
    Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
    print (Data)
        ip.src   ip.dst src_country dst_country
    0  x.x.x.x      NaN       china    pakistan
    1      NaN  b.b.b.b          US       china
    2  z.z.z.z      NaN       china       india
英文:

Use DataFrame.loc for modify ip.src/ip.dst columns:

Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']

print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india

Or:

m = Data[['src_country','dst_country']] == 'china'
Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india

答案3

得分: 1

Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

英文:
Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

output

ip.src	ip.dst	src_country	dst_country
x.x.x.x	NaN	    china	    pakistan
NaN	    b.b.b.b	US	        china
z.z.z.z	NaN	    china	    india

答案4

得分: 0

另一个可能的解决方案:

Data[['ip.src', 'ip.dst']] = (np.where(
    Data[['src_country', 'dst_country']].eq('china'), 
    np.nan, Data[['ip.src', 'ip.dst']]))

输出:

    ip.src   ip.dst src_country dst_country
0      NaN  a.a.a.a       china    pakistan
1  y.y.y.y      NaN          US       china
2      NaN  c.c.c.c       china       india
英文:

Another possible solution:

Data[['ip.src', 'ip.dst']] = (np.where(
    Data[['src_country', 'dst_country']].eq('china'), 
    np.nan, Data[['ip.src', 'ip.dst']]))

Output:

    ip.src   ip.dst src_country dst_country
0      NaN  a.a.a.a       china    pakistan
1  y.y.y.y      NaN          US       china
2      NaN  c.c.c.c       china       india

huangapple
  • 本文由 发表于 2023年7月3日 16:53:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603252.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定