英文:
Search for substring in entire dataframe and if substring is found print next row to the searched substring
问题
假设我有一个数据框df(从Excel表中读取),其中有超过3000行。我想在df中搜索一个在df中出现超过50次的字符串,并且我想打印出在找到我的搜索字符串(子字符串)的行之后的下一行(仅一行)。这意味着它应该打印出在找到我的子字符串/搜索字符串的行后面的下一行/行。
我尝试过:
df = pd.read_excel('sample.xlsx')
substring = "Size of file is:"
result = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]
这会返回搜索到的字符串'Size of file is'。但我想要在整个数据中找到搜索字符串的位置,并打印出下一行。
英文:
Suppose I have a dataframe df (read from an excel sheet) with over 3000s rows. I want to search a string in df which has occured more than 50 times in df and I want to print the next row (only one) to the row in which my searched string is found (substring). It means that it should print next single row/line which is present just after the row in which my substring/searched string is found.
I've tried:
df=pd.read_excel(sample.xlsx)
substring="Size of file is:"
df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1)]
This returns the searched string which is 'Size of file is'. But I want to print the next single row/line wherever my searched string is found in the whole data.
答案1
得分: 1
使用 Series.shift 与 fill_value=False:
np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa', 'abs', 'abdf', 'abg'], size=(10, 3)))
print(df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf
substring = 'abd'
df1 = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
类似的解决方案:
df1 = df[df.apply(lambda col: col.astype(str).str.contains(substring, case=False)).any(axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
英文:
Use Series.shift with fill_value=False:
np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa','abs','abdf','abg'], size=(10, 3)))
print (df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf
substring = 'abd'
df1 = df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1).shift(fill_value=False)]
print (df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
Similar solution:
df1 = df[df.apply(lambda col:col.astype(str).str.contains(substring,case=False)).any(axis=1).shift(fill_value=False)]
print (df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
答案2
得分: 0
使用矢量化计算:
df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
英文:
use vectorized calculations :
df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论