2023年2月10日 15:17:18go评论100阅读模式

英文:

Search for substring in entire dataframe and if substring is found print next row to the searched substring

问题

假设我有一个数据框df（从Excel表中读取），其中有超过3000行。我想在df中搜索一个在df中出现超过50次的字符串，并且我想打印出在找到我的搜索字符串（子字符串）的行之后的下一行（仅一行）。这意味着它应该打印出在找到我的子字符串/搜索字符串的行后面的下一行/行。

我尝试过：

df = pd.read_excel('sample.xlsx')
substring = "Size of file is:"
result = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]

这会返回搜索到的字符串'Size of file is'。但我想要在整个数据中找到搜索字符串的位置，并打印出下一行。

英文:

Suppose I have a dataframe df (read from an excel sheet) with over 3000s rows. I want to search a string in df which has occured more than 50 times in df and I want to print the next row (only one) to the row in which my searched string is found (substring). It means that it should print next single row/line which is present just after the row in which my substring/searched string is found.

I've tried:

df=pd.read_excel(sample.xlsx)
substring=&quot;Size of file is:&quot;
df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1)]

This returns the searched string which is 'Size of file is'. But I want to print the next single row/line wherever my searched string is found in the whole data.

答案1

得分: 1

使用 Series.shift 与 fill_value=False：

np.random.seed(2000)
df = pd.DataFrame(np.random.choice(['aa', 'abs', 'abdf', 'abg'], size=(10, 3)))
print(df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf
substring = 'abd'
df1 = df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs
类似的解决方案：
df1 = df[df.apply(lambda col: col.astype(str).str.contains(substring, case=False)).any(axis=1).shift(fill_value=False)]
print(df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs

英文:

Use Series.shift with fill_value=False:

np.random.seed(2000)
df = pd.DataFrame(np.random.choice([&#39;aa&#39;,&#39;abs&#39;,&#39;abdf&#39;,&#39;abg&#39;], size=(10, 3)))
print (df)
      0     1     2
0  abdf    aa  abdf
1   abs  abdf   abg
2    aa    aa    aa
3  abdf   abg  abdf
4    aa   abg   abg
5  abdf   abg   abs
6  abdf    aa  abdf
7   abg   abg   abs
8   abs    aa   abg
9    aa    aa  abdf
substring = &#39;abd&#39;
df1 = df[df.apply(lambda row:row.astype(str).str.contains(substring,case=False).any(),axis=1).shift(fill_value=False)]
print (df1)
      0     1     2
1   abs  abdf   abg
2    aa    aa    aa
4    aa   abg   abg
6  abdf    aa  abdf
7   abg   abg   abs

答案2

得分: 0

使用矢量化计算：

df.iloc[1:df[df['column_serch'].str.contains('substring')].index.values[0]]

英文:

use vectorized calculations :

df.iloc[1:df[df[&#39;column_serch&#39;].str.contains(&#39;substring&#39;)].index.values[0]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Search for substring in entire dataframe and if substring is found print next row to the searched substring

问题

答案1

答案2

创建子表格，基于数据框的列数值。

按下 R 键时，调用一个将所有变量重置为默认值的函数

可以强制 tkinter.Text 小部件在“空格”字符以及单词上换行吗？

数据库为什么在成功的Django POST请求时没有收到数据？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。