英文:
Select only specific part of STRING column
问题
我有一张包含“Description”列的表,我需要查询该表并仅获取“Description”列的特定部分。
以下是“Description”列中的一些示例:
|DESCRIPTION|
|someRandomt|
|it has 0.5g|
|23g is enou|
|otherRandom|
|55g, 0.1g, |
在SELECT语句中,我需要仅选择带有字母“g”的不同数值(浮点数或整数),如果有多个,我需要使用“/”将它们连接起来。在这种情况下,最终结果应该如下:
|DESCRIPTION|
|null |
|0.5g |
|23g |
|null |
|55g/0.1g |
英文:
I have a table with a column "Description", I need to query that table and get only specific part of the Description column.
Here a few examples of what's inside the 'Description' column:
|DESCRIPTION|
-------------
|someRandomt|
|it has 0.5g|
|23g is enou|
|otherRandom|
|55g, 0.1g, |
In the SELECT statement, I need to select only the different numeric values (float or int) with the letter 'g' behind it and if there are more than one, I need to concatenate them using '/'. The final result in this case should be like this:
|DESCRIPTION|
-------------
|null |
|0.5g |
|23g |
|null |
|55g/0.1g |
答案1
得分: 1
使用regexp_extract_all
提取带有"g"的数字,然后使用array_join
将它们合并为一个字符串。
以下是PySpark的示例代码:
from pyspark.sql import functions as f
df.withColumn('nums', f.array_join(f.expr('regexp_extract_all(DESCRIPTION, "([0-9]+(.[0-9]+)?g)")'), '/')) \
.show()
+-----------+--------+
|DESCRIPTION| nums|
+-----------+--------+
|someRandomt| |
|it has 0.5g| 0.5g|
|23g is enou| 23g|
|otherRandom| |
|55g, 0.1g, |55g/0.1g|
+-----------+--------+
<details>
<summary>英文:</summary>
Extract the numbers with g by `regexp_extract_all` and then `array_join` to get a single string.
Here is a code example for PySpark.
from pyspark.sql import functions as f
df.withColumn('nums', f.array_join(f.expr('regexp_extract_all(DESCRIPTION, "([0-9]+(.[0-9]+)?g)")'), '/'))
.show()
+-----------+--------+
|DESCRIPTION| nums|
+-----------+--------+
|someRandomt| |
|it has 0.5g| 0.5g|
|23g is enou| 23g|
|otherRandom| |
|55g, 0.1g, |55g/0.1g|
+-----------+--------+
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论