英文:
How can I delete rows that have text after a specific substring?
问题
我面临的问题是数据库中的行通常遵循特定的顺序:
-
四位数字ID - 通常为IE01
-
系统ID - 5个字符
-
主ID - 在前缀之后的5个字符字符串 - 可以是"I0"、"N0"或"Q0"
-
标签号 - 3-5个字符的字符串
典型的行看起来像"IE01-236613-I073619-F021"。标签是F021,但也可能有其他一百个标签。
然而,我想要的是只保留不包含任何标签的行。
在行示例中,我想要删除"F021",使其看起来像"IE01-2313-IO73619"。
有些行可能不遵循这个结构,但它们都有一个具有5个字符的主ID前缀。
如果从高层次上来说:
-
寻找主ID前缀("I0"、"N0"或"Q0"之一)
-
向前数5个字符
-
删除包含此之后任何内容的行。
我尝试过的是查询只包含主ID(带有5个字符)的行,但那只选择了一次,一旦我这样做了,我意识到这不是我需要的。我无法弄清楚如何包括所有信息,直到主ID中的5个字符结束。
SELECT
CASE
WHEN t.field1 LIKE '%-I0%' THEN LEFT(t.field1, CHARINDEX('-I0', t.field1) + 4)
WHEN t.field1 LIKE '%-N0%' THEN LEFT(t.field1, CHARINDEX('-N0', t.field1) + 4)
WHEN t.field1 LIKE '%-Q0%' THEN LEFT(t.field1, CHARINDEX('-Q0', t.field1) + 4)
ELSE NULL
END AS trimmed_field
FROM table1 t
这只会返回上面示例中的行的起始部分,直到"73619"的结束。我希望这对你有所帮助!
英文:
I'm faced with a database right now and the information in the rows normally follow a specific sequence:
-
Four digit ID - usually IE01
-
System ID - 5 characters
-
Main ID - 5 character string following a prefix - either "I0", "N0" or "Q0"
-
Tag number - 3-5 character string
A typical row would look like "IE01-236613-I073619-F021". The tag is F021, but there could be a hundred other tags also.
What I want though, is to only keep the rows that contain no tag whatsoever.
In the row example, I want to remove the "F021" so it looks like "IE01-2313-IO73619".
Some rows might not follow that structure, but what they all have in common is a main ID prefix with 5 characters.
If I could, at a high level:
-
Look for a main ID prefix (either "I0", "N0" or "Q0")
-
Count 5 characters ahead
-
Delete rows that contain anything after that.
What I've tried is to query rows that contain JUST the main ID (with the 5 chars) but that only selects and once I did it., I realised it wasn't what I needed. I can't figure out how to include all the info, right up to where the 5 characters in the main ID ends.
SELECT
CASE
WHEN t.field1 LIKE '%-I0%' THEN SUBSTRING(t.field1, CHARINDEX('I0', t.field1) + LEN('I0'), 5)
WHEN t.field1 LIKE '%-N0%' THEN SUBSTRING(t.field1, CHARINDEX('N0', t.field1) + LEN('N0'), 5)
WHEN t.field1 LIKE '%-Q0%' THEN SUBSTRING(t.field1, CHARINDEX('Q0', t.field1) + LEN('Q0'), 5)
ELSE NULL
END AS next_five_chars
FROM table1 t
That just returns, in the above example, 73619
. I want the very start of the row, right up to the end of 73619
.
I've been racking my brain and can't seem to figure it out. I bet it's something really obvious!
答案1
得分: 1
你的子串查询几乎完成了。
- 如果你在
len('I0')
上加5,它将从 "Main ID" 后开始切片。 - 然后,取一个大数字而不是仅取5个字符(
substr
的最后一个参数)。
看下面的示例:
data_sdf. \
withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
show(truncate=False)
# +------------------------+----------+
# |c1 |afterchars|
# +------------------------+----------+
# |IE01-236613-I073619-F021|-F021 |
# |IE01-236613-I073619- |- |
# |IE01-236613-I073619 | |
# +------------------------+----------+
筛选 afterchars
列以保留长度为零的行:
data_sdf. \
withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
filter(func.length('afterchars') == 0). \
show(truncate=False)
# +-------------------+----------+
# |c1 |afterchars|
# +-------------------+----------+
# |IE01-236613-I073619| |
# +-------------------+----------+
等效的 SQL 查询:
select * from (
select *, substr(c1, instr(c1, "I0")+7, 100) as afterchars
from db.tbl)
where length(afterchars) = 0
英文:
you're almost there in your substring query.
- if you add 5 to the
len('I0')
, it will start the slice after the "Main ID" - then instead of taking only 5 characters (the last parameter of
substr
), take a large number
see example
data_sdf. \
withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
show(truncate=False)
# +------------------------+----------+
# |c1 |afterchars|
# +------------------------+----------+
# |IE01-236613-I073619-F021|-F021 |
# |IE01-236613-I073619- |- |
# |IE01-236613-I073619 | |
# +------------------------+----------+
filter the afterchars
column to keep rows with the length of zero
data_sdf. \
withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
filter(func.length('afterchars') == 0). \
show(truncate=False)
# +-------------------+----------+
# |c1 |afterchars|
# +-------------------+----------+
# |IE01-236613-I073619| |
# +-------------------+----------+
equivalent SQL query
select * from (
select *, substr(c1, instr(c1, "I0")+7, 100) as afterchars
from db.tbl)
where length(afterchars) = 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论