如何在给定的字符串中获取最小值或所需值,当字符串中有斜杠时。

huangapple go评论56阅读模式
英文:

How to get min value or desired value in given string when string is having slash in between

问题

以下是已经翻译好的内容:

我有一列数据,其中在数字之间有斜杠,例如下面所示,无论字符串中的数字出现在哪里,我都需要获取最小值,只有当数字和字母混合出现时,我才需要获取混合的部分。这需要在pyspark的数据框中完成。

示例输入:

  1. 111/112
  2. 113/PAG
  3. 801/802/803/804
  4. 801/62S

期望的输出应该是:

  1. 111
  2. PAG
  3. 801
  4. 62S

我尝试了将数据框列拆分,但不起作用。请帮助我解决这个问题。

英文:

I have a column which is having slash in between for example given below, where ever numbers are present in a string I need to get min value where ever their is number and alpha numeric then I need to get only alpha numeric. This has to be done in pysaprk dataframe.

Example input:

  1. 111/112
  2. 113/PAG
  3. 801/802/803/804
  4. 801/62S

Desired output should be

  1. 111
  2. PAG
  3. 801
  4. 62S

I have tried exploding the dataframe column but it doesn't work. please help me on this.

答案1

得分: 0

尝试使用**array_min函数,通过使用内置函数split**。

  • split -> 拆分字符串并创建数组
  • array_min -> 从数组中获取最小值

示例:

df = spark.createDataFrame([('111/112',),('113/PAG',),('801/802',),('801/62S',)],['ip'])
df.withColumn("ip",array_min(split(col("ip"),"/"))).show(10,False)
#+---+
#|ip |
#+---+
#|111|
#|113|
#|801|
#|62S|
#+---+

更新:

from pyspark.sql.functions import *

df = spark.createDataFrame([('111/112',),('113/PAG/PAZ',),('801/802',),('801/62S',)],['ip'])
df.withColumn("temp_ip1", split(col("ip"),"/").cast("array<int>")).\
  withColumn("temp_ip2", split(col("ip"),"/")).\
  withColumn("temp", array_except(col("temp_ip2"),array_except(col("temp_ip1"),array(lit(None))).cast("array<string>"))).\
    withColumn("min_ip", array_min(when(size(col("temp"))>0,col("temp")).otherwise(col("temp_ip2")))).\
      drop(*['temp_ip1','temp_ip2','temp']).\
    show(10,False)

#+-----------+------+
#|ip         |min_ip|
#+-----------+------+
#|111/112    |111   |
#|113/PAG/PAZ|PAG   |
#|801/802    |801   |
#|801/62S    |62S   |
#+-----------+------+
英文:

Try with array_min function by using split inbuilt function.

  • split -> splits the string and create array
  • array_min -> get minimum value from the array

Example:

df = spark.createDataFrame([('111/112',),('113/PAG',),('801/802',),('801/62S',)],['ip'])
df.withColumn("ip",array_min(split(col("ip"),"/"))).show(10,False)
#+---+
#|ip |
#+---+
#|111|
#|113|
#|801|
#|62S|
#+---+

UPDATE:

from pyspark.sql.functions import *

df = spark.createDataFrame([('111/112',),('113/PAG/PAZ',),('801/802',),('801/62S',)],['ip'])
df.withColumn("temp_ip1", split(col("ip"),"/").cast("array<int>")).\
  withColumn("temp_ip2", split(col("ip"),"/")).\
  withColumn("temp", array_except(col("temp_ip2"),array_except(col("temp_ip1"),array(lit(None))).cast("array<string>"))).\
    withColumn("min_ip", array_min(when(size(col("temp"))>0,col("temp")).otherwise(col("temp_ip2")))).\
      drop(*['temp_ip1','temp_ip2','temp']).\
    show(10,False)

#+-----------+------+
#|ip         |min_ip|
#+-----------+------+
#|111/112    |111   |
#|113/PAG/PAZ|PAG   |
#|801/802    |801   |
#|801/62S    |62S   |
#+-----------+------+

huangapple
  • 本文由 发表于 2023年7月20日 20:10:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76729705.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定