英文:
Python API call sampling issue
问题
我正在尝试以编程方式(使用Python)重新创建“转化率>电子商务>产品性能报告”,其中产品是主要维度,广告系列是次要维度。
我似乎在尝试为客户获取ga:campaign维度时遇到了严重的抽样问题,即使将日期范围限制为一天。我正在获取数据的属性是G360帐户,无论我设置抽样级别如何,抽样似乎都是相同的:'samplesReadCounts': ['999984'],'samplingSpaceSizes': ['3980975']。
如果只有产品作为维度,我可以获取非抽样数据,但是如果只有广告系列或两者一起使用,就会出现抽样问题。
我已尝试将抽样级别设置为SMALL、LARGE、DEFAULT,以查看samplesReadCounts和samplingSpaceSizes是否会改变,但结果都没有改变。
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': startdt, 'endDate': enddt}],
'metrics': [{'expression': 'ga:itemRevenue'}, {'expression': 'ga:uniquePurchases'}, {'expression': 'ga:itemQuantity'},
{'expression': 'ga:revenuePerItem'}, {'expression': 'ga:itemsPerPurchase'}, {'expression': 'ga:productRefundAmount'}],
'dimensions': [{'name': 'ga:productName'}, {'name': 'ga:campaign'}],
'samplingLevel': 'LARGE',
}]
}
).execute()
英文:
I am trying to programmatically(in python) recreate the Conversions>ecommerce>product performance report with product as the primary dimension and campaign as the secondary.
I appear to be having a massive sampling issue trying to pull ga:campaign dimension for a client, even when restricting the date range to a single day. the property I am pulling data for is a G360 account and regardless of what I set samplingLevel the sampling appears to be the same 'samplesReadCounts': ['999984'], 'samplingSpaceSizes': ['3980975'].
I am able to get unsampled data if I only have product as the dimension but it is sampling with campaign alone or the 2 together.
I have tried samplingLevel as SMALL, LARGE, DEFAULT to see if the samplesReadCounts and samplingSpaceSizes would change but it does not.
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': startdt, 'endDate': enddt}],
'metrics': [{'expression': 'ga:itemRevenue'},{'expression': 'ga:uniquePurchases'},{'expression': 'ga:itemQuantity'},
{'expression': 'ga:revenuePerItem'},{'expression': 'ga:itemsPerPurchase'},{'expression': 'ga:productRefundAmount'}],
'dimensions': [{'name': 'ga:productName' },{'name': 'ga:campaign' }],
'samplingLevel':'LARGE',
}]
}
).execute()
答案1
得分: 1
有很多独特的广告系列数值吗?如果是的话,你可能正在经历高基数问题。这就是为什么产品正常,但当你添加广告系列维度时,它会再次被采样(产品数值*广告系列数值的数量)意味着结果行数大幅增加。
不幸的是,根据我的经验,通过API请求在历史数据上很难解决这个问题。由于你有360,你仍然可以通过UI手动请求未采样数据。在今后的工作中,你还可以通过设置视图设置下的筛选器或创建新的视图来减少广告系列数值。但是,这只适用于未来收集的数据。
你还可以考虑使用BigQuery,它允许你查询未采样的数据(需付费)。然而,你不能访问历史数据,只能访问你设置之后开始存储的数据。
英文:
Are there a lot of unique campaign values? If so, you may be experiencing a high-cardinality issue. That would be why product is fine but when you add campaign dimension, it's sampled again (number of product values * number of campaign values) means a significant increase in the number of result rows.
Unfortunately, in my experience there's not much you can do about it on historical data through API requests. Since you have 360, you can still request the unsampled data manually through the UI. Moving forward, you can also reduce the number of campaign values by setting up Filters under your View settings or creating new Views entirely. However, this would only apply to future-collected data.
You may also look into using BigQuery, which allows you to query unsampled data (at a cost). However, you can't access historical data, only what starts being stored from the point after you've set it up.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论