Search for substring in entire dataframe and if substring is found print next row to the searched substring

huangapple go评论100阅读模式
英文:

Search for substring in entire dataframe and if substring is found print next row to the searched substring

问题

假设我有一个数据框df(从Excel表中读取),其中有超过3000行。我想在df中搜索一个在df中出现超过50次的字符串,并且我想打印出在找到我的搜索字符串(子字符串)的行之后的下一行(仅一行)。这意味着它应该打印出在找到我的子字符串/搜索字符串的行后面的下一行/行。

我尝试过:

  1. df = pd.read_excel('sample.xlsx')
  2. substring = "Size of file is:"
  3. result = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]

这会返回搜索到的字符串'Size of file is'。但我想要在整个数据中找到搜索字符串的位置,并打印出下一行。

英文:

Suppose I have a dataframe df (read from an excel sheet) with over 3000s rows. I want to search a string in df which has occured more than 50 times in df and I want to print the next row (only one) to the row in which my searched string is found (substring). It means that it should print next single row/line which is present just after the row in which my substring/searched string is found.

I've tried:

  1. df=pd.read_excel(sample.xlsx)
  2. substring="Size of file is:"
  3. df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1)]

This returns the searched string which is 'Size of file is'. But I want to print the next single row/line wherever my searched string is found in the whole data.

答案1

得分: 1

使用 Series.shiftfill_value=False

  1. np.random.seed(2000)
  2. df = pd.DataFrame(np.random.choice(['aa', 'abs', 'abdf', 'abg'], size=(10, 3)))
  3. print(df)
  4. 0 1 2
  5. 0 abdf aa abdf
  6. 1 abs abdf abg
  7. 2 aa aa aa
  8. 3 abdf abg abdf
  9. 4 aa abg abg
  10. 5 abdf abg abs
  11. 6 abdf aa abdf
  12. 7 abg abg abs
  13. 8 abs aa abg
  14. 9 aa aa abdf
  15. substring = 'abd'
  16. df1 = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1).shift(fill_value=False)]
  17. print(df1)
  18. 0 1 2
  19. 1 abs abdf abg
  20. 2 aa aa aa
  21. 4 aa abg abg
  22. 6 abdf aa abdf
  23. 7 abg abg abs
  24. 类似的解决方案
  25. df1 = df[df.apply(lambda col: col.astype(str).str.contains(substring, case=False)).any(axis=1).shift(fill_value=False)]
  26. print(df1)
  27. 0 1 2
  28. 1 abs abdf abg
  29. 2 aa aa aa
  30. 4 aa abg abg
  31. 6 abdf aa abdf
  32. 7 abg abg abs
英文:

Use Series.shift with fill_value=False:

  1. np.random.seed(2000)
  2. df = pd.DataFrame(np.random.choice(['aa','abs','abdf','abg'], size=(10, 3)))
  3. print (df)
  4. 0 1 2
  5. 0 abdf aa abdf
  6. 1 abs abdf abg
  7. 2 aa aa aa
  8. 3 abdf abg abdf
  9. 4 aa abg abg
  10. 5 abdf abg abs
  11. 6 abdf aa abdf
  12. 7 abg abg abs
  13. 8 abs aa abg
  14. 9 aa aa abdf
  15. substring = 'abd'
  16. df1 = df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1).shift(fill_value=False)]
  17. print (df1)
  18. 0 1 2
  19. 1 abs abdf abg
  20. 2 aa aa aa
  21. 4 aa abg abg
  22. 6 abdf aa abdf
  23. 7 abg abg abs

Similar solution:

  1. df1 = df[df.apply(lambda col:col.astype(str).str.contains(substring,case=False)).any(axis=1).shift(fill_value=False)]
  2. print (df1)
  3. 0 1 2
  4. 1 abs abdf abg
  5. 2 aa aa aa
  6. 4 aa abg abg
  7. 6 abdf aa abdf
  8. 7 abg abg abs

答案2

得分: 0

使用矢量化计算:

  1. df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
英文:

use vectorized calculations :

  1. df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]

huangapple
  • 本文由 发表于 2023年2月10日 15:17:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定