Search for substring in entire dataframe and if substring is found print next row to the searched substring

huangapple go评论69阅读模式
英文:

Search for substring in entire dataframe and if substring is found print next row to the searched substring

问题

假设我有一个数据框df(从Excel表中读取),其中有超过3000行。我想在df中搜索一个在df中出现超过50次的字符串,并且我想打印出在找到我的搜索字符串(子字符串)的行之后的下一行(仅一行)。这意味着它应该打印出在找到我的子字符串/搜索字符串的行后面的下一行/行。

我尝试过:

df = pd.read_excel('sample.xlsx')
substring = "Size of file is:"
result = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]

这会返回搜索到的字符串'Size of file is'。但我想要在整个数据中找到搜索字符串的位置,并打印出下一行。

英文:

Suppose I have a dataframe df (read from an excel sheet) with over 3000s rows. I want to search a string in df which has occured more than 50 times in df and I want to print the next row (only one) to the row in which my searched string is found (substring). It means that it should print next single row/line which is present just after the row in which my substring/searched string is found.

I've tried:

df=pd.read_excel(sample.xlsx)
substring="Size of file is:"
df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1)]

This returns the searched string which is 'Size of file is'. But I want to print the next single row/line wherever my searched string is found in the whole data.

答案1

得分: 1

使用 Series.shiftfill_value=False

np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa', 'abs', 'abdf', 'abg'], size=(10, 3)))
print(df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf

substring = 'abd'
df1 = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs

类似的解决方案

df1 = df[df.apply(lambda col: col.astype(str).str.contains(substring, case=False)).any(axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
英文:

Use Series.shift with fill_value=False:

np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa','abs','abdf','abg'], size=(10, 3)))
print (df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf

substring = 'abd'
df1 = df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1).shift(fill_value=False)]
print (df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs

Similar solution:

df1 = df[df.apply(lambda col:col.astype(str).str.contains(substring,case=False)).any(axis=1).shift(fill_value=False)]
print (df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs

答案2

得分: 0

使用矢量化计算:

df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
英文:

use vectorized calculations :

df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]

huangapple
  • 本文由 发表于 2023年2月10日 15:17:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定