英文:
Iterate over rows in a data frame to search by regular expressions
问题
我正在尝试使用正则表达式从查询中提取SQL表格。对于单个查询,可以使用re.findall
来完成。
import re
Query = ["SELECT * FROM WS_DE_Staging.stage_dual_h_20"]
for xx in Query:
r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
print(r1)
但是现在在第二阶段中,我必须将其用于一个包含所有报表名称和其SQL查询的表格。因此,我使用pandas读取CSV文件并创建一个数据框。
不知道下一步该如何使用上述的re.findall
表达式迭代所有出现的情况。
import re
Query = ["SELECT * FROM WS_JE_Staging.stage_dual_h_20", "Select * from DummyEmployee"]
for xx in Query:
r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
print(r1)
import pandas as pd, re, numpy as np
df = pd.read_csv("SqlQ.csv", delimiter=',')
print(df.index)
英文:
I am trying to fetch sql tables from query using regular expressions. That is done for single query by using re.findall
import re
Query = ["SELECT * FROM WS_DE_Staging.stage_dual_h_20"]
for xx in Query:
r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
print(r1)
But now in phase 2 I have to use this for a table in which I hold all the report names and their sql queries. So I am using pandas to read the CSV and create a data frame.
Don't know the next step how I can iterate over all the occurrences using the above re.findall
expression.
import re
Query = ["SELECT * FROM WS_JE_Staging.stage_dual_h_20", "Select * from DummyEmployee"]
for xx in Query:
r1 = re.findall(r"FROM|JOIN|from|join\s+([A-Za-z_.\[\]]+)",xx)
print(r1)
import pandas as pd, re, numpy as np
df = pd.read_csv("SqlQ.csv", delimiter=',')
print(df.index)
答案1
得分: 0
正如您所提到的,可以使用re.IGNORECASE
来简化正则表达式模式。从数据框中提取表名比迭代项目更好。
import pandas as pd
import re
pattern = r"(?<=from|join)\s+([\da-z_.\[\]]+)"
df = pd.read_csv(r"d:\temp\SqlQ.csv", delimiter=',')
df['table'] = df['sql'].str.extract(pattern, flags=re.IGNORECASE)
print(df)
使用Set()
获取唯一的表列表。
df['table'] = df['sql'].str.findall(pattern, flags=re.IGNORECASE)
df['table'] = df['table'].apply(lambda x: list(set(x)))
英文:
The regex pattern can be simplified using re.IGNORECASE
. Extracting table names from DF is better than iterating items.
import pandas as pd
import re
pattern = r"(?<=from|join)\s+([\da-z_.\[\]]+)"
df = pd.read_csv(r"d:\temp\SqlQ.csv", delimiter=',')
df['table'] = df['sql'].str.extract(pattern, flags=re.IGNORECASE)
print(df)
Get unique table list with Set()
.
df['table'] = df['sql'].str.findall(pattern, flags=re.IGNORECASE)
df['table'] = df['table'].apply(lambda x: list(set(x)))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论