2023年7月3日 16:53:44go评论105阅读模式

英文:

keep value in a column based on condition

问题

import pandas as pd
data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
'src_country':['china','US','china'],
'dst_country':['pakistan','china','india']
}
Data=pd.DataFrame(data)
# Keep only the values in ip.src where src_country is 'china'
Data['ip.src'] = Data['ip.src'][Data['src_country'] == 'china']
# Keep only the values in ip.dst where dst_country is 'china'
Data['ip.dst'] = Data['ip.dst'][Data['dst_country'] == 'china']
# Drop rows where both ip.src and ip.dst are NaN
Data = Data.dropna(subset=['ip.src', 'ip.dst'])
# Reset the index
Data = Data.reset_index(drop=True)
Data

这段代码将保留仅在src_country为'china'时的ip.src列中的值，以及仅在dst_country为'china'时的ip.dst列中的值。然后，删除同时为NaN的ip.src和ip.dst的行，并重新设置索引。

英文:

I have a dataframe

import pandas as pd
data={&#39;ip.src&#39;:[&#39;x.x.x.x&#39;,&#39;y.y.y.y&#39;,&#39;z.z.z.z&#39;],
&#39;ip.dst&#39;:[&#39;a.a.a.a&#39;,&#39;b.b.b.b&#39;,&#39;c.c.c.c&#39;],
&#39;src_country&#39;:[&#39;china&#39;,&#39;US&#39;,&#39;china&#39;],
&#39;dst_country&#39;:[&#39;pakistan&#39;,&#39;china&#39;,&#39;india&#39;]
}
Data=pd.DataFrame(data)

I want to keep only that value in ip.src and ip.dst columns which has china ,like if china is in src_country then it should only keep the value in ip.src and if china is in dst_country then it should only keep the value in ip.dst.Is there any way to do it?

答案1

得分: 1

import numpy as np
Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]
Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)
Data
    ip.src   ip.dst src_country dst_country
0  x.x.x.x  a.a.a.a       中国         NaN
1  y.y.y.y  b.b.b.b         NaN       中国
2  z.z.z.z  c.c.c.c       中国         NaN

英文:

Something like this?

import numpy as np
Data = Data[(Data[&#39;src_country&#39;] == &#39;china&#39;) | (Data[&#39;dst_country&#39;] == &#39;china&#39;)]
Data[[&#39;src_country&#39;, &#39;dst_country&#39;]] = Data[[&#39;src_country&#39;, &#39;dst_country&#39;]].applymap(lambda x: np.nan if x != &#39;china&#39; else x)
Data
    ip.src   ip.dst src_country dst_country
0  x.x.x.x  a.a.a.a       china         NaN
1  y.y.y.y  b.b.b.b         NaN       china
2  z.z.z.z  c.c.c.c       china         NaN

答案2

得分: 1

使用 [`DataFrame.loc`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) 来修改 `ip.src/ip.dst` 列：
    Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
    Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']
    
    print (Data)
        ip.src   ip.dst src_country dst_country
    0  x.x.x.x      NaN       china    pakistan
    1      NaN  b.b.b.b          US       china
    2  z.z.z.z      NaN       china       india
或者：
    m = Data[['src_country','dst_country']] == 'china'
    Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
    print (Data)
        ip.src   ip.dst src_country dst_country
    0  x.x.x.x      NaN       china    pakistan
    1      NaN  b.b.b.b          US       china
    2  z.z.z.z      NaN       china       india

英文:

Use DataFrame.loc for modify ip.src/ip.dst columns:

Data[&#39;ip.src&#39;] = Data.loc[Data[&#39;src_country&#39;] == &#39;china&#39;, &#39;ip.src&#39;]
Data[&#39;ip.dst&#39;] = Data.loc[Data[&#39;dst_country&#39;] == &#39;china&#39;, &#39;ip.dst&#39;]
print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india

Or:

m = Data[[&#39;src_country&#39;,&#39;dst_country&#39;]] == &#39;china&#39;
Data[[&#39;ip.src&#39;, &#39;ip.dst&#39;]] = Data[[&#39;ip.src&#39;, &#39;ip.dst&#39;]].where(m.to_numpy())
print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india

答案3

得分: 1

Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

英文:

Data[&#39;ip.src&#39;] = Data[&#39;ip.src&#39;][(Data[&#39;src_country&#39;] == &#39;china&#39;)]
Data[&#39;ip.dst&#39;] = Data[&#39;ip.dst&#39;][(Data[&#39;dst_country&#39;] == &#39;china&#39;)]

output

ip.src	ip.dst	src_country	dst_country
x.x.x.x	NaN	    china	    pakistan
NaN	    b.b.b.b	US	        china
z.z.z.z	NaN	    china	    india

答案4

得分: 0

另一个可能的解决方案：

Data[['ip.src', 'ip.dst']] = (np.where(
    Data[['src_country', 'dst_country']].eq('china'), 
    np.nan, Data[['ip.src', 'ip.dst']]))

输出：

    ip.src   ip.dst src_country dst_country
0      NaN  a.a.a.a       china    pakistan
1  y.y.y.y      NaN          US       china
2      NaN  c.c.c.c       china       india

英文:

Another possible solution:

Data[[&#39;ip.src&#39;, &#39;ip.dst&#39;]] = (np.where(
    Data[[&#39;src_country&#39;, &#39;dst_country&#39;]].eq(&#39;china&#39;), 
    np.nan, Data[[&#39;ip.src&#39;, &#39;ip.dst&#39;]]))

Output:

    ip.src   ip.dst src_country dst_country
0      NaN  a.a.a.a       china    pakistan
1  y.y.y.y      NaN          US       china
2      NaN  c.c.c.c       china       india

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于条件在列中保留数值。

问题

答案1

答案2

答案3

答案4

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

Assertion Error vs Assert in pytest, why do I get different error messages when running pytest

调用模型的方法并在序列化器内处理接收到的对象

Explanation on python slice function output 关于Python切片函数输出的解释

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论