英文:
Azure Data Factory Copy Data error when exporting from BigQuery
问题
需要从存储在BigQuery上的“events”数据集中导出Google Analytics数据。
在尝试导出数据时,我在“Copy Data”活动上遇到了问题,我收到的错误如下:
在“Source”端发生故障。 ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [HY000] [Microsoft][BigQuery] (310) Storage API Error: Error thrown in GAvroDataCell:BuildDefaultAvroCell.Invalid BigQueryType specified for copy/move operation. Type is : 0,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,'
'Type=System.Data.Odbc.OdbcException,Message=ERROR [HY000] [Microsoft][BigQuery] (310) Storage API Error: Error thrown in GAvroDataCell:BuildDefaultAvroCell.Invalid BigQueryType specified for copy/move operation. Type is : 0,Source=Microsoft ODBC Driver for Google BigQuery,'
管道由单个“Copy Data”活动组成,这是我执行的查询:
@concat('SELECT
event_date,
event_timestamp,
event_name,
user_id,
user_pseudo_id,
TO_JSON(event_params) as event_params,
event_previous_timestamp,
event_value_in_usd,
event_bundle_sequence_id,
event_server_timestamp_offset,
TO_JSON(privacy_info) as privacy_info,
TO_JSON(user_properties) as user_properties,
user_first_touch_timestamp,
TO_JSON(user_ltv) as user_ltv,
TO_JSON(device) as device,
TO_JSON(geo) as geo,
TO_JSON(app_info) as app_info,
TO_JSON(traffic_source) as traffic_source,
TO_JSON(stream_id) as stream_id,
TO_JSON(platform) as platform,
TO_JSON(event_dimensions) as event_dimensions,
TO_JSON(ecommerce) as ecommerce,
TO_JSON(items) as items
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_2023*`'
)
类型为“BigQuery”的链接服务有效,因为“测试连接”返回“ok”,数据集也正常,因为“预览数据”从远程数据集返回数据。
“Copy Data”活动上的“预览数据”给出了以下错误:
我发现这与表名中的通配符“*”有关。当我构建像这样的动态查询时,复制数据起作用:
@concat('SELECT <fields>
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_', '20230520`')
但这个不起作用:
@concat('SELECT <fields>
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_', '202305', '*', '`')
我正在调查Google源方面的问题。
英文:
I need to export Google Analytics data from the "events" dataset stored on BigQuery.
I'm having ad issue on the Copy Data activity when trying to export data,
the error I'm receiving is the following:
Failure happened on 'Source' side. ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [HY000] [Microsoft][BigQuery] (310) Storage API Error: Error thrown in GAvroDataCell:BuildDefaultAvroCell.Invalid BigQueryType specified for copy/move operation. Type is : 0,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [HY000] [Microsoft][BigQuery] (310) Storage API Error: Error thrown in GAvroDataCell:BuildDefaultAvroCell.Invalid BigQueryType specified for copy/move operation. Type is : 0,Source=Microsoft ODBC Driver for Google BigQuery,'
The pipeline is made of a single Copy Data activity and this is the query I'm executing:
@concat('SELECT
event_date,
event_timestamp,
event_name,
user_id,
user_pseudo_id,
TO_JSON(event_params) as event_params,
event_previous_timestamp,
event_value_in_usd,
event_bundle_sequence_id,
event_server_timestamp_offset,
TO_JSON(privacy_info) as privacy_info,
TO_JSON(user_properties) as user_properties,
user_first_touch_timestamp,
TO_JSON(user_ltv) as user_ltv,
TO_JSON(device) as device,
TO_JSON(geo) as geo,
TO_JSON(app_info) as app_info,
TO_JSON(traffic_source) as traffic_source,
TO_JSON(stream_id) as stream_id,
TO_JSON(platform) as platform,
TO_JSON(event_dimensions) as event_dimensions,
TO_JSON(ecommerce) as ecommerce,
TO_JSON(items) as items
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_2023*`'
)
The linked service of type "BigQuery" is working since the "test connection" results with "ok", and also the dataset is ok because "preview data" returns data from the remote dataset.
The "preview data" on the Copy Data activity gives error like this:
I've found that it has something to do with the widlcard "*" in the tablename. The copy data works when I construct a dynamic query like this:
@concat('SELECT <fields>
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_', '20230520`')
But this one is not working:
@concat('SELECT <fields>
FROM `xxxxxxxproject-bigquery.analytics_310106672.events_', '202305', '*', '`')
I'm investigating on google source side.
答案1
得分: 0
据我所知,目前 ADF 可能不支持像 Big Query 查询那样的通配符表选择,这可能是导致错误的原因。
您的需求的解决方法可以是使用以下查询列出源中的所有表格,使用 Lookup 活动:
SELECT table_name FROM `<project>`.`<dataset_id>`.INFORMATION_SCHEMA.TABLES;
如果需要,您可以使用 starts_with()
在筛选活动中过滤此输出数组。
然后,将此输出数组提供给 ForEach 活动。
在 ForEach 中,使用复制活动,并为源数据集和接收数据集都使用 @item().table_name
(在此处使用数据集参数)。
如果要将数据复制到 ADLS,请将每个表复制到一个文件夹的临时表中。在 ForEach 之后,使用另一个复制活动,并在源中使用通配符文件路径。在复制活动的接收端,使用 merge 复制类型,这些文件将保存到单个文件中。
英文:
AFAIK, Currently ADF might not support the wildcard table selection like Big Query queries and that is the reason for your error.
The workaround for your requirement can be list out all the tables from the source using the below query with a Lookup activity.
SELECT table_name FROM `<project>`.<dataset_id>.INFORMATION_SCHEMA.TABLES;
If you want, you can filter this output array with starts_with()
using filter activity.
Then Give this output array to a ForEach activity.
Inside ForEach, use copy activity and give @item().table_name
for both source dataset and sink dataset(use dataset parameters here).
If you are copying data to ADLS copy each table to a temporary table of a folder and after ForEach, use another copy activity and use wild card file path in the source. In the sink of the copy activity, use merge copy type and these files will be saved to single file.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论