在整个 pandas 数据框中查找部分字符串匹配的列和行。

huangapple go评论88阅读模式
英文:

Find column and row of partial string match in entire pandas dataframe

问题

I'm trying to locate the row and the column in a pandas dataframe that partially match a given string.
So far, my approach is based on an iteration over all the columns (and the rows in every column) to return boolean "True" values:

rowindex = []
columnindex = []

i = 0

for i in range(0, len(df.columns)):
    ask = df.iloc[:, i].str.contains('string')
    
    for j in range(0, len(ask)):
        ask2 = np.equal(ask, True) 
        if ask2 == True:
            columnindex.append(i)
            rowindex.append(j)
            
        j + 1
            
    i + 1

The problem is that I always get this error message for the "if ask2 == True:" statement:
"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

Thank you for your help on this!

英文:

I'm trying to locate the row and the column in a pandas dataframe that partially match a given string.
So far, my approach is based on an iteration over all the columns (and the rows in every column) to return boolean "True" values:

rowindex = []
columnindex = []

i = 0

for i in range (0, len(df.columns)):
    ask = df.iloc[:, i].str.contains('string')
    
    for j in range (0, len(ask)):
        ask2 = np.equal(ask, True) 
        if ask2 == True:
            columnindex.append(i)
            rowindex.append(j)
            
        j + 1
            
    i + 1

The problem is that I always get this error message for the "if ask2 == True:" statement:
>The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thank you for your help on this!

答案1

得分: 2

你原始的错误源于numpy.equal()方法返回一个布尔数组而不是单个布尔值。

英文:

Your original error came from the fact that the numpy.equal() method returns an array of booleans rather than a single one.

答案2

得分: 0

如果您想要定位 pandas DataFrame 中部分匹配给定字符串的行和列,可以使用 pandas 提供的矢量化操作,而不是遍历所有列和行。这种方法更高效,推荐用于处理 pandas DataFrames。以下是如何实现的示例:

import pandas as pd

# 创建一个示例DataFrame
data = {
    'Name': ['John Doe', 'Jane Smith', 'Mike Johnson'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)

# 定义要匹配的部分字符串
partial_string = 'Jo'

# 使用 `str.contains` 方法检查DataFrame中的每个元素是否包含部分字符串
matches = df.apply(lambda col: col.astype(str).str.contains(partial_string, case=False))

# 获取部分字符串匹配的行和列索引
rows, cols = matches.values.nonzero()

# 打印匹配的行和列
for row, col in zip(rows, cols):
    print(f"Match found at Row: {row}, Column: {df.columns[col]}")

希望这对您有所帮助。

英文:

If you want to locate the rows and columns in a pandas DataFrame that partially match a given string, you can use the vectorized operations provided by pandas instead of iterating over all the columns and rows. This approach is more efficient and recommended for working with pandas DataFrames. Here's an example of how you can achieve this:

import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John Doe', 'Jane Smith', 'Mike Johnson'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)

# Define the partial string you want to match
partial_string = 'Jo'

# Use the `str.contains` method to check if each element in the DataFrame contains the partial string
matches = df.apply(lambda col: col.astype(str).str.contains(partial_string, case=False))

# Get the row and column indices where the partial string matches
rows, cols = matches.values.nonzero()

# Print the matched rows and columns
for row, col in zip(rows, cols):
    print(f"Match found at Row: {row}, Column: {df.columns[col]}")

huangapple
  • 本文由 发表于 2023年6月29日 22:13:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581895.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定