2023年6月1日 10:42:03go评论53阅读模式

英文:

how to loop a dataframe and check if there is the same name as another column

问题

我只会翻译代码部分，不包括注释：

so im just one step ahead of cleaning my dataframe but i encountered some problem when i were to clean the "name" column of this dataframe below:

i wanted to check each of the row if the *company values* is equal to the *company name* contained inside the "name" column, if TRUE: ignore it. if FALSE: delete the row
while it may looks simple since the "company name" are just the first index, but my thought is what if there are some names that put the "company name" somewhere in the "name" columns(Not the first index)

ive tried my code below but it doesnt seem to work:

for x in file["company"]:#Loop company
  for i in file["name"]:#Loop name
    i = i.title().split(" ")#Capitalize each word and split name
    for j in i:
      if j == x:#if the "company name" is equal to "company values"
        pass #ignore
      else:
        file.drop("""THAT ROW""") #remove that row

英文:

so im just one step ahead of cleaning my dataframe but i encountered some problem when i were to clean the "name" column of this dataframe below:

i wanted to check each of the row if the company values is equal to the company name contained inside the "name" column, if TRUE: ignore it. if FALSE: delete the row
while it may looks simple since the "company name" are just the first index, but my thought is what if there are some names that put the "company name" somewhere in the "name" columns(Not the first index)

ive tried my code below but it doesnt seem to work:

for x in file[&quot;company&quot;]:#Loop company
  for i in file[&quot;name&quot;]:#Loop name
    i = i.title().split(&quot; &quot;)#Capitalize each word and split name
    for j in i:
      if j == x:#if the &quot;company name&quot; is equal to &quot;company values&quot;
        pass #ignore
      else:
        file.drop(&quot;&quot;&quot;THAT ROW&quot;&quot;&quot;) #remove that row

答案1

得分: 1

你想要的是删除那些name值不包含其company值的行。

让我们从一个示例数据框开始：

data = [
    {
        'name': 'Hyundai something',
        'company': 'Hyundai',
    },
    {
        'name': 'Tesla Model Y',
        'company': 'Tesla',
    },
    {
        'name': 'something Hyundai',
        'company': 'Hyundai',
    },
    {
        'name': 'something',
        'company': 'Hyundai',
    },
    {
        'name': 'XYZ Ford Car',
        'company': 'Ford',
    },
]
df = pd.DataFrame(data)

接下来，我们使用apply来迭代行，并在company值在name值内时返回True。请注意，我添加了.lower()以忽略大小写。您可以根据需要进行调整。

contained = df.apply(
    lambda row: row['company'].lower() in row['name'].lower(), 
    axis=1
)

最后，您可以根据条件筛选数据框，或者删除False索引。

df.drop(contained[contained == False].index)

结果如下所示：

	name	            company
0	Hyundai something	Hyundai
1	Tesla Model Y	    Tesla
2	something Hyundai	Hyundai
4	XYZ Ford Car	    Ford

英文:

What you want is to drop rows whose name value does not contain its company value.

Let's start with a toy dataframe:

data = [
    {
        &#39;name&#39;: &#39;Hyundai something&#39;,
        &#39;company&#39;: &#39;Hyundai&#39;,
    },
    {
        &#39;name&#39;: &#39;Tesla Model Y&#39;,
        &#39;company&#39;: &#39;Tesla&#39;,
    },
    {
        &#39;name&#39;: &#39;something Hyundai&#39;,
        &#39;company&#39;: &#39;Hyundai&#39;,
    },
    {
        &#39;name&#39;: &#39;something&#39;,
        &#39;company&#39;: &#39;Hyundai&#39;,
    },
    {
        &#39;name&#39;: &#39;XYZ Ford Car&#39;,
        &#39;company&#39;: &#39;Ford&#39;,
    },
]
df = pd.DataFrame(data)

Next, we use apply to iterate rows and return True if company value is within name value. Note that I added .lower() to ignore case. You can adjust this to fit what you need.

contained = dd.apply(
    lambda row: row[&#39;company&#39;].lower() in row[&#39;name&#39;].lower(), 
    axis=1
)

Finally, you can either filter your dataframe by condition, or you can drop the False indices.

df.drop(contained[contained == False].index)

	name	            company
0	Hyundai something	Hyundai
1	Tesla Model Y	    Tesla
2	something Hyundai	Hyundai
4	XYZ Ford Car	    Ford

答案2

得分: 1

使用boolean indexing，如果包含小写后拆分name列并在列表推导中选择行以提高性能：

df = file[[x.lower() in y.lower().split() for x, y in zip(file['company'], file['name'])]]

英文:

Use boolean indexing with select rows if contain substring after lowercase and splitting name column in list comprehension for improve performance:

df = file[[x.lower() in y.lower().split() for x, y in zip(file[&#39;company&#39;], file[&#39;name&#39;])]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Loop a dataframe and check if there is the same name as another column.

问题

答案1

答案2

pandas：根据条件筛选整个分组。

重新调整具有不同值的因子基于其他列的值的组。

用Python读取具有奇怪分隔符的文件

删除Pandas数据帧中基于整行最大值的行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论