问题

我正在尝试使用正则表达式从查询中提取SQL表格。对于单个查询，可以使用re.findall来完成。

import re
Query = ["SELECT * FROM   WS_DE_Staging.stage_dual_h_20"]
for xx in Query:
    r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
    print(r1)

但是现在在第二阶段中，我必须将其用于一个包含所有报表名称和其SQL查询的表格。因此，我使用pandas读取CSV文件并创建一个数据框。

不知道下一步该如何使用上述的re.findall表达式迭代所有出现的情况。

import re
Query = ["SELECT * FROM   WS_JE_Staging.stage_dual_h_20", "Select * from DummyEmployee"]
for xx in Query:
    r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
    print(r1)

import pandas as pd, re, numpy as np
df = pd.read_csv("SqlQ.csv", delimiter=',')

print(df.index)

英文:

I am trying to fetch sql tables from query using regular expressions. That is done for single query by using re.findall

import re
Query = [&quot;SELECT * FROM   WS_DE_Staging.stage_dual_h_20&quot;]
for xx in Query:
    r1 = re.findall(r&quot;FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)&quot;,xx)
    print(r1)

But now in phase 2 I have to use this for a table in which I hold all the report names and their sql queries. So I am using pandas to read the CSV and create a data frame.

Don't know the next step how I can iterate over all the occurrences using the above re.findall expression.

import re
Query = [&quot;SELECT * FROM   WS_JE_Staging.stage_dual_h_20&quot;, &quot;Select * from DummyEmployee&quot;]
for xx in Query:
    r1 = re.findall(r&quot;FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)&quot;,xx)
    print(r1)

import pandas as pd, re, numpy as np
df = pd.read_csv(&quot;SqlQ.csv&quot;, delimiter=&#39;,&#39;)

print(df.index)

答案1

得分: 0

正如您所提到的，可以使用re.IGNORECASE来简化正则表达式模式。从数据框中提取表名比迭代项目更好。

import pandas as pd
import re
pattern = r"(?<=from|join)\s+([\da-z_.\[\]]+)"
df = pd.read_csv(r"d:\temp\SqlQ.csv", delimiter=',')
df['table'] = df['sql'].str.extract(pattern, flags=re.IGNORECASE)
print(df)

使用Set()获取唯一的表列表。

df['table'] = df['sql'].str.findall(pattern, flags=re.IGNORECASE)
df['table'] = df['table'].apply(lambda x: list(set(x)))

英文:

The regex pattern can be simplified using re.IGNORECASE. Extracting table names from DF is better than iterating items.

import pandas as pd
import re
pattern = r&quot;(?&lt;=from|join)\s+([\da-z_.\[\]]+)&quot;
df = pd.read_csv(r&quot;d:\temp\SqlQ.csv&quot;, delimiter=&#39;,&#39;)
df[&#39;table&#39;] = df[&#39;sql&#39;].str.extract(pattern, flags=re.IGNORECASE)
print(df)

Get unique table list with Set().

df[&#39;table&#39;] = df[&#39;sql&#39;].str.findall(pattern, flags=re.IGNORECASE)
df[&#39;table&#39;] = df[&#39;table&#39;].apply(lambda x: list(set(x)))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框中迭代行以通过正则表达式进行搜索。

问题

答案1

Pythonic方式将一个二维数组移到一个五维数组中？（家谱项目）

能否将我的复杂for循环转换为嵌套的列表推导式？

Python的json.dump()输出似乎不相关

在Matplotlib绘图上添加一个圆圈到特定日期。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论