将行按照 Azure Data Factory 中的缩写进行分割。

huangapple go评论66阅读模式
英文:

Split rows by abbreviation in Azure Data Factory

问题

I am trying to split my column 'State' by abbreviating only the first 2 letters and also making it uppercase in the final output. As you can see from the example there is Ohio, Wisconsin, and Illinois at the top. I just want the output to be OH, WI, IL etc..

英文:

I am trying to split my column 'State' by abbreviating only the first 2 letters and also making it uppercase in the final output. As you can see from the example there is Ohio, Wisconsin, and Illinois at the top. I just want the output to be OH, WI, IL etc..

State example

将行按照 Azure Data Factory 中的缩写进行分割。

I have used upper but not sure what else to use to shorten the length of the abbreviation.

答案1

得分: 1

  • 您可以使用 Azure 数据流的 upper()left 函数来实现此要求。以下是我选取的示例数据:

将行按照 Azure Data Factory 中的缩写进行分割。

  • 现在使用如下所示的派生列转换,您可以获得所需的输出:
upper(left(State,2))

将行按照 Azure Data Factory 中的缩写进行分割。

  • 以下是最终数据预览中结果的显示方式:

将行按照 Azure Data Factory 中的缩写进行分割。

注意: 这不考虑州名称的前两个字母相同可能引起问题的情况。例如,北卡罗来纳和北达科他,或田纳西和德克萨斯。如果尚未处理这些情况,应单独处理。

英文:
  • You can use the azure dataflow's upper() and left functions to achieve this requirement. The following is the sample data that I have taken as source:

将行按照 Azure Data Factory 中的缩写进行分割。

  • Now using the derived column transformation as shown below, you can get the required output:
upper(left(State,2))

将行按照 Azure Data Factory 中的缩写进行分割。

  • The following is how the result would appear in the final data preview:

将行按照 Azure Data Factory 中的缩写进行分割。

NOTE: This does not account for states having same first 2 letters and thus causing a problem. For example, North Carolina and North Dakota or Tennessee and Texas. These cases should be handled separately if they are not handled already.

答案2

得分: 0

  1. 在你的管道中添加一个“复制数据”活动来读取源数据。根据你的数据源配置源数据集和关联服务。
  2. 在“复制数据”活动后添加一个“派生列”活动。
  3. 配置“派生列”活动:
    选择源数据集和关联服务。
    在“派生列”选项卡中,点击“+ 添加”按钮添加一个新的派生列。
    为派生列提供一个名称(例如,“Abbreviation”)。
    在“表达式”字段中,使用“split”和“upper”函数来按缩写分割行并将其转换为大写。
    例如,如果你的数据中有一个名为“Description”的列,而缩写是用方括号括起来的(例如,“[Ohio]”),你可以使用以下表达式:
    upper(split(Description, '[')1)
    此表达式将“Description”列按照'['字符分割,提取结果数组的第二个元素(代表缩写),并使用“upper”函数将其转换为大写。
  4. 在“派生列”活动后添加一个“接收器”活动,将数据写入所需的目标。
    根据你的目标配置接收器数据集和关联服务。
英文:
  1. Add a "Copy Data" activity to your pipeline to read the source data.
    Configure the source dataset and linked service according to your data source.

  2. Add a "Derived Column" activity after the "Copy Data" activity.
    3.Configure the "Derived Column" activity:
    Select the source dataset and linked service.
    In the "Derived Column" tab, click on the "+ Add" button to add a new derived
    column.

    Provide a name for the derived column (e.g., "Abbreviation").

    In the "Expression" field, use the "split" and "upper" functions to split the
    rows by the abbreviation and convert it to uppercase.

    For example, if your data has a column named "Description" and the abbreviation
    is enclosed in square brackets (e.g., "[Ohio]"), you can use the following
    expression:

    upper(split(Description, '[')1)

This expression splits the "Description" column by the '[' character, retrieves the second element of the resulting array (which represents the abbreviation), and converts it to uppercase using the "upper" function.
Add a "Sink" activity after the "Derived Column" activity to write the data to the desired destination.
Configure the sink dataset and linked service according to your destination

huangapple
  • 本文由 发表于 2023年6月15日 00:44:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定