英文:
Pyspark extracting exactly 4 consecutive numeric digit from a column and return it in a new column
问题
I am very new in using pyshark and have no idea how to do it
I am trying to extract from a title column.
Some value in the title column are:
Under Ground2(1990)
Waterword(1995)
Incredible
Skate (1991) board
That girl 2002”
I am trying to get:
1990
1995
1991
2002
This is what i have tried :
import pyspark.sql.functions as F
from pyspark.sql.functions import split
from pyspark.sql.functions import regexp_replace
movies_DF=movies_DF.withColumn('title', regexp_replace(movies_DF.title, "(", ""))
movies_DF=movies_DF.withColumn('title', regexp_replace(movies_DF.title, ")", ""))
movies_DF=movies_DF.withColumn('yearOfRelease',(f.expr('substring(title,-4)')))
My output column that have:
1990
1995
board
2002”
英文:
I am very new in using pyshark and have no idea how to do it
I am trying to extract from a title column.
Some value in the title column are:
Under Ground2(1990)
Waterword(1995)
Incredible
Skate (1991) board
That girl 2002”
I am trying to get:
1990
1995
1991
2002
This is what i have tried :
import pyspark.sql.functions as F
from pyspark.sql.functions import split
from pyspark.sql.functions import regexp_replace
movies_DF=movies_DF.withColumn('title', regexp_replace(movies_DF.title, "\(",""))
movies_DF=movies_DF.withColumn('title', regexp_replace(movies_DF.title, "\)",""))
movies_DF=movies_DF.withColumn('yearOfRelease',(f.expr('substring(title,-4)')))
My output column that have:
1990
1995
board
2002”
dible
答案1
得分: 1
使用 regexp_extract
函数:
from pyspark.sql.functions import regexp_extract, col
df = df.withColumn('Year', regexp_extract(col('Title'), r'\((\d{4})\)$', 1))
df.show()
+-------------------+----+
| Title|Year|
+-------------------+----+
|Under Ground2(1990)|1990|
| Waterword(1995)|1995|
+-------------------+----+
英文:
Use regexp_extract
function:
from pyspark.sql.functions import regexp_extract, col
df = df.withColumn('Year', regexp_extract(col('Title'), r'\((\d{4})\)$', 1))
df.show()
+-------------------+----+
| Title|Year|
+-------------------+----+
|Under Ground2(1990)|1990|
| Waterword(1995)|1995|
+-------------------+----+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论