如何删除在特定子字符串之后有文本的行?

huangapple go评论65阅读模式
英文:

How can I delete rows that have text after a specific substring?

问题

我面临的问题是数据库中的行通常遵循特定的顺序:

  1. 四位数字ID - 通常为IE01

  2. 系统ID - 5个字符

  3. 主ID - 在前缀之后的5个字符字符串 - 可以是"I0"、"N0"或"Q0"

  4. 标签号 - 3-5个字符的字符串

典型的行看起来像"IE01-236613-I073619-F021"。标签是F021,但也可能有其他一百个标签。

然而,我想要的是只保留不包含任何标签的行。

在行示例中,我想要删除"F021",使其看起来像"IE01-2313-IO73619"。

有些行可能不遵循这个结构,但它们都有一个具有5个字符的主ID前缀。

如果从高层次上来说:

  1. 寻找主ID前缀("I0"、"N0"或"Q0"之一)

  2. 向前数5个字符

  3. 删除包含此之后任何内容的行。

我尝试过的是查询只包含主ID(带有5个字符)的行,但那只选择了一次,一旦我这样做了,我意识到这不是我需要的。我无法弄清楚如何包括所有信息,直到主ID中的5个字符结束。

SELECT 
  CASE
    WHEN t.field1 LIKE '%-I0%' THEN LEFT(t.field1, CHARINDEX('-I0', t.field1) + 4)
    WHEN t.field1 LIKE '%-N0%' THEN LEFT(t.field1, CHARINDEX('-N0', t.field1) + 4)
    WHEN t.field1 LIKE '%-Q0%' THEN LEFT(t.field1, CHARINDEX('-Q0', t.field1) + 4)
    ELSE NULL
  END AS trimmed_field
FROM table1 t

这只会返回上面示例中的行的起始部分,直到"73619"的结束。我希望这对你有所帮助!

英文:

I'm faced with a database right now and the information in the rows normally follow a specific sequence:

  1. Four digit ID - usually IE01

  2. System ID - 5 characters

  3. Main ID - 5 character string following a prefix - either "I0", "N0" or "Q0"

  4. Tag number - 3-5 character string

A typical row would look like "IE01-236613-I073619-F021". The tag is F021, but there could be a hundred other tags also.

What I want though, is to only keep the rows that contain no tag whatsoever.

In the row example, I want to remove the "F021" so it looks like "IE01-2313-IO73619".

Some rows might not follow that structure, but what they all have in common is a main ID prefix with 5 characters.

If I could, at a high level:

  1. Look for a main ID prefix (either "I0", "N0" or "Q0")

  2. Count 5 characters ahead

  3. Delete rows that contain anything after that.

What I've tried is to query rows that contain JUST the main ID (with the 5 chars) but that only selects and once I did it., I realised it wasn't what I needed. I can't figure out how to include all the info, right up to where the 5 characters in the main ID ends.

SELECT 
  CASE
    WHEN t.field1 LIKE '%-I0%' THEN SUBSTRING(t.field1, CHARINDEX('I0', t.field1) + LEN('I0'), 5)
    WHEN t.field1 LIKE '%-N0%' THEN SUBSTRING(t.field1, CHARINDEX('N0', t.field1) + LEN('N0'), 5)
    WHEN t.field1 LIKE '%-Q0%' THEN SUBSTRING(t.field1, CHARINDEX('Q0', t.field1) + LEN('Q0'), 5)
    ELSE NULL
  END AS next_five_chars
FROM table1 t

That just returns, in the above example, 73619. I want the very start of the row, right up to the end of 73619.

I've been racking my brain and can't seem to figure it out. I bet it's something really obvious!

答案1

得分: 1

你的子串查询几乎完成了。

  1. 如果你在 len('I0') 上加5,它将从 "Main ID" 后开始切片。
  2. 然后,取一个大数字而不是仅取5个字符(substr 的最后一个参数)。

看下面的示例:

data_sdf. \
    withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
    show(truncate=False)

# +------------------------+----------+
# |c1                      |afterchars|
# +------------------------+----------+
# |IE01-236613-I073619-F021|-F021     |
# |IE01-236613-I073619-    |-         |
# |IE01-236613-I073619     |          |
# +------------------------+----------+

筛选 afterchars 列以保留长度为零的行:

data_sdf. \
    withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
    filter(func.length('afterchars') == 0). \
    show(truncate=False)

# +-------------------+----------+
# |c1                 |afterchars|
# +-------------------+----------+
# |IE01-236613-I073619|          |
# +-------------------+----------+

等效的 SQL 查询:

select * from (
    select *, substr(c1, instr(c1, "I0")+7, 100) as afterchars
    from db.tbl)
where length(afterchars) = 0
英文:

you're almost there in your substring query.

  1. if you add 5 to the len('I0'), it will start the slice after the "Main ID"
  2. then instead of taking only 5 characters (the last parameter of substr), take a large number

see example

data_sdf. \
    withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
    show(truncate=False)

# +------------------------+----------+
# |c1                      |afterchars|
# +------------------------+----------+
# |IE01-236613-I073619-F021|-F021     |
# |IE01-236613-I073619-    |-         |
# |IE01-236613-I073619     |          |
# +------------------------+----------+

filter the afterchars column to keep rows with the length of zero

data_sdf. \
    withColumn('afterchars', func.expr('substr(c1, instr(c1, "I0")+7, 100)')). \
    filter(func.length('afterchars') == 0). \
    show(truncate=False)

# +-------------------+----------+
# |c1                 |afterchars|
# +-------------------+----------+
# |IE01-236613-I073619|          |
# +-------------------+----------+

equivalent SQL query

select * from (
    select *, substr(c1, instr(c1, "I0")+7, 100) as afterchars
    from db.tbl)
where length(afterchars) = 0

huangapple
  • 本文由 发表于 2023年6月8日 16:02:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定