2023年2月23日 19:53:37go评论67阅读模式

英文:

BigQuery splitting a row into multiple columns

问题

我有一个存储日志的表，它将所有日志存储在一行中。我已经对其进行了展开操作，一个示例行如下：

TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1) 等等。

我想将TestObserver、TestFragment等内容转换为列，将其值(2)、(1)等转换为行。这些测试日志有很多，它们的数量不是固定的。它们之间没有分隔符。我还想在每一行中对每个TestObserver、TestFragment等的值进行求和。

我该如何实现这一目标？

我尝试过修剪和分割，但结果不符合预期。

英文:

I have a table that stores logs into one row. I have unnested it and one example row looks like this:

TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1) and so on.

I would like to conveert TestObserver, TestFragment and so on into columns and its values (2), (1), etc into its rows. There are plenty of these Test logs, it is not a fixated number of them. There is no delimiter between them. I would also like to SUM each TestObserver, TestFragment and so on values found in each row.

How can I achieve this?

I tried trimming and splitting but the result is not what it is expected.

答案1

得分: 0

假设在您的日志中没有一致的分隔符（甚至没有空格）分隔术语，我们仍然可以使用 REGEXP_EXTRACT_ALL() 函数和 BigQuery 中的 UNNEST 运算符来获得您想要的结果。

步骤 1. 使用 regex_replace() 将日志中的所有空格移除。

步骤 2. 使用 REGEXP_EXTRACT_ALL() 将每个（术语（数量））标记提取到一个数组中。

步骤 3. 对数组进行 UNNEST 操作，使每行只包含一个（术语（数量））标记。

步骤 4. 使用 regex_extract() 分别提取术语和数量。请注意，我们需要使用负向先行正则表达式(?= 和 ?<=) 来获取我们需要的确切数据，用于术语和数量。

步骤 5. 按术语分组并对数量进行求和以获得结果。

请查看以下示例代码：

WITH log_extracted AS (
  SELECT
    REGEXP_EXTRACT_ALL(
      REGEXP_REPLACE(
        'TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1)',
        '_',
        ''
      ),
      '\w+\(\d+\)'
    ) AS log_tokens
)
SELECT
  REGEXP_EXTRACT(log_token, '(\w+)(?=\(\d+\))') AS term,
  SUM(CAST(REGEXP_EXTRACT(log_token, '(?<=\w\()\d+') AS INT64)) AS quantity
FROM log_extracted
CROSS JOIN UNNEST(log_tokens) AS log_token
GROUP BY
  1

示例输出：

term	quantity
TestFragment	1
TestView	1
TestObserver	2
TestNotifications	2
TestActivity	1

英文:

Assume there is no consistent delimiter(not even a space) between terms in your log, we can still get your desired results by using REGEXP_EXTRACT_ALL() function and UNNEST operator in BigQuery.

Step 1. Use regex_replace() to move all spaces in the log

step 2. Use REGEXP_EXTRACT_ALL() to fetch each (term(quantity)) token into an array.

Step 3. UNNEST the array to one (term(quantity)) token per row

Step 4. Use regex_extract() to fetch term and quantity separately. Please, we need to use negative lookaround regex(?= and ?<=) to get the exact data we need for term and quantity.

Step 5. Group by term and sum() on quantity to get the result.

See the following code as example:

WITH log_extracted AS (
  SELECT
    REGEXP_EXTRACT_ALL(
      REGEXP_REPLACE(
        &#39;TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1)&#39;,
        &#39;_&#39;,
        &#39;&#39;
      ),
      &#39;\w+\(\d+\)&#39;
    ) AS log_tokens
)
SELECT
  REGEXP_EXTRACT(log_token, &#39;(\w+)(?=\(\d+\))&#39;) AS term,
  SUM(CAST(REGEXP_EXTRACT(log_token, &#39;(?&lt;=\w\()\d+&#39;) AS INT64)) AS quantity
FROM log_extracted
CROSS JOIN UNNEST(log_tokens) AS log_token
GROUP BY
  1

example output:

term	quantity
TestFragment	1
TestView	1
TestObserver	2
TestNotifications	2
TestActivity	1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

BigQuery将一行拆分为多个列。

问题

答案1

使用与列名相同的变量名 – Clickhouse

SQL ORACLE – 查找连续的日期

什么是构建数据库管理Java应用程序的正确方式？

查询返回的 varbinary 字符串比实际字符串短。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论