英文:
Search for substring in entire dataframe and if substring is found print next row to the searched substring
问题
假设我有一个数据框df
(从Excel表中读取),其中有超过3000行。我想在df
中搜索一个在df
中出现超过50次的字符串,并且我想打印出在找到我的搜索字符串(子字符串)的行之后的下一行(仅一行)。这意味着它应该打印出在找到我的子字符串/搜索字符串的行后面的下一行/行。
我尝试过:
df = pd.read_excel('sample.xlsx')
substring = "Size of file is:"
result = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]
这会返回搜索到的字符串'Size of file is'
。但我想要在整个数据中找到搜索字符串的位置,并打印出下一行。
英文:
Suppose I have a dataframe df
(read from an excel sheet) with over 3000s rows. I want to search a string in df
which has occured more than 50 times in df
and I want to print the next row (only one) to the row in which my searched string is found (substring). It means that it should print next single row/line which is present just after the row in which my substring/searched string is found.
I've tried:
df=pd.read_excel(sample.xlsx)
substring="Size of file is:"
df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1)]
This returns the searched string which is 'Size of file is'
. But I want to print the next single row/line wherever my searched string is found in the whole data.
答案1
得分: 1
使用 Series.shift
与 fill_value=False
:
np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa', 'abs', 'abdf', 'abg'], size=(10, 3)))
print(df)
0 1 2
0 abdf aa abdf
1 abs abdf abg
2 aa aa aa
3 abdf abg abdf
4 aa abg abg
5 abdf abg abs
6 abdf aa abdf
7 abg abg abs
8 abs aa abg
9 aa aa abdf
substring = 'abd'
df1 = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1).shift(fill_value=False)]
print(df1)
0 1 2
1 abs abdf abg
2 aa aa aa
4 aa abg abg
6 abdf aa abdf
7 abg abg abs
类似的解决方案:
df1 = df[df.apply(lambda col: col.astype(str).str.contains(substring, case=False)).any(axis=1).shift(fill_value=False)]
print(df1)
0 1 2
1 abs abdf abg
2 aa aa aa
4 aa abg abg
6 abdf aa abdf
7 abg abg abs
英文:
Use Series.shift
with fill_value=False
:
np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa','abs','abdf','abg'], size=(10, 3)))
print (df)
0 1 2
0 abdf aa abdf
1 abs abdf abg
2 aa aa aa
3 abdf abg abdf
4 aa abg abg
5 abdf abg abs
6 abdf aa abdf
7 abg abg abs
8 abs aa abg
9 aa aa abdf
substring = 'abd'
df1 = df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1).shift(fill_value=False)]
print (df1)
0 1 2
1 abs abdf abg
2 aa aa aa
4 aa abg abg
6 abdf aa abdf
7 abg abg abs
Similar solution:
df1 = df[df.apply(lambda col:col.astype(str).str.contains(substring,case=False)).any(axis=1).shift(fill_value=False)]
print (df1)
0 1 2
1 abs abdf abg
2 aa aa aa
4 aa abg abg
6 abdf aa abdf
7 abg abg abs
答案2
得分: 0
使用矢量化计算:
df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
英文:
use vectorized calculations :
df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论