基于条件在列中保留数值。

huangapple go评论105阅读模式
英文:

keep value in a column based on condition

问题

  1. import pandas as pd
  2. data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
  3. 'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
  4. 'src_country':['china','US','china'],
  5. 'dst_country':['pakistan','china','india']
  6. }
  7. Data=pd.DataFrame(data)
  8. # Keep only the values in ip.src where src_country is 'china'
  9. Data['ip.src'] = Data['ip.src'][Data['src_country'] == 'china']
  10. # Keep only the values in ip.dst where dst_country is 'china'
  11. Data['ip.dst'] = Data['ip.dst'][Data['dst_country'] == 'china']
  12. # Drop rows where both ip.src and ip.dst are NaN
  13. Data = Data.dropna(subset=['ip.src', 'ip.dst'])
  14. # Reset the index
  15. Data = Data.reset_index(drop=True)
  16. Data

这段代码将保留仅在src_country为'china'时的ip.src列中的值,以及仅在dst_country为'china'时的ip.dst列中的值。然后,删除同时为NaN的ip.srcip.dst的行,并重新设置索引。

英文:

I have a dataframe

  1. import pandas as pd
  2. data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
  3. 'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
  4. 'src_country':['china','US','china'],
  5. 'dst_country':['pakistan','china','india']
  6. }
  7. Data=pd.DataFrame(data)

I want to keep only that value in ip.src and ip.dst columns which has china ,like if china is in src_country then it should only keep the value in ip.src and if china is in dst_country then it should only keep the value in ip.dst.Is there any way to do it?

答案1

得分: 1

  1. import numpy as np
  2. Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]
  3. Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)
  4. Data
  5. ip.src ip.dst src_country dst_country
  6. 0 x.x.x.x a.a.a.a 中国 NaN
  7. 1 y.y.y.y b.b.b.b NaN 中国
  8. 2 z.z.z.z c.c.c.c 中国 NaN
英文:

Something like this?

  1. import numpy as np
  2. Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]
  3. Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)
  4. Data
  5. ip.src ip.dst src_country dst_country
  6. 0 x.x.x.x a.a.a.a china NaN
  7. 1 y.y.y.y b.b.b.b NaN china
  8. 2 z.z.z.z c.c.c.c china NaN

答案2

得分: 1

  1. 使用 [`DataFrame.loc`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) 来修改 `ip.src/ip.dst`
  2. Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
  3. Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']
  4. print (Data)
  5. ip.src ip.dst src_country dst_country
  6. 0 x.x.x.x NaN china pakistan
  7. 1 NaN b.b.b.b US china
  8. 2 z.z.z.z NaN china india
  9. 或者
  10. m = Data[['src_country','dst_country']] == 'china'
  11. Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
  12. print (Data)
  13. ip.src ip.dst src_country dst_country
  14. 0 x.x.x.x NaN china pakistan
  15. 1 NaN b.b.b.b US china
  16. 2 z.z.z.z NaN china india
英文:

Use DataFrame.loc for modify ip.src/ip.dst columns:

  1. Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
  2. Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']
  3. print (Data)
  4. ip.src ip.dst src_country dst_country
  5. 0 x.x.x.x NaN china pakistan
  6. 1 NaN b.b.b.b US china
  7. 2 z.z.z.z NaN china india

Or:

  1. m = Data[['src_country','dst_country']] == 'china'
  2. Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
  3. print (Data)
  4. ip.src ip.dst src_country dst_country
  5. 0 x.x.x.x NaN china pakistan
  6. 1 NaN b.b.b.b US china
  7. 2 z.z.z.z NaN china india

答案3

得分: 1

Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

英文:
  1. Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
  2. Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

output

  1. ip.src ip.dst src_country dst_country
  2. x.x.x.x NaN china pakistan
  3. NaN b.b.b.b US china
  4. z.z.z.z NaN china india

答案4

得分: 0

另一个可能的解决方案:

  1. Data[['ip.src', 'ip.dst']] = (np.where(
  2. Data[['src_country', 'dst_country']].eq('china'),
  3. np.nan, Data[['ip.src', 'ip.dst']]))

输出:

  1. ip.src ip.dst src_country dst_country
  2. 0 NaN a.a.a.a china pakistan
  3. 1 y.y.y.y NaN US china
  4. 2 NaN c.c.c.c china india
英文:

Another possible solution:

  1. Data[['ip.src', 'ip.dst']] = (np.where(
  2. Data[['src_country', 'dst_country']].eq('china'),
  3. np.nan, Data[['ip.src', 'ip.dst']]))

Output:

  1. ip.src ip.dst src_country dst_country
  2. 0 NaN a.a.a.a china pakistan
  3. 1 y.y.y.y NaN US china
  4. 2 NaN c.c.c.c china india

huangapple
  • 本文由 发表于 2023年7月3日 16:53:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603252.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定