从Kaggle使用Kaggle的API导入数据集到Databricks。

huangapple go评论63阅读模式
英文:

Import dataset from Kaggle to Databricks with Kaggle's API

问题

尝试1错误的原因是 "未经授权",这可能是因为凭据配置不正确。尝试2也有相同的问题。你可以检查以下几点:

  1. 确保你的 Kaggle 用户名和密钥 (KAGGLE_USERNAMEKAGGLE_KEY) 是正确的。

  2. 确保你在 Databricks 中正确配置了 kaggle.json 凭据文件,以便你的代码可以找到它。

  3. 请确保你的 Kaggle 账户有权限访问 taricov/mobile-wallets-in-egypt-2020 数据集。

  4. 如果你在 Databricks 上使用 Python,确保环境中已经安装了 kaggle 包。

如果你已经检查了以上内容但仍然遇到问题,可能需要重新配置凭据或者联系 Kaggle 支持以获取更多帮助。

英文:

Here trying to import a dataset from **Kaggle **to **DataBricks **(community) with their Kaggle' API. But I'm falling and lost 3 days. Please a kind soul can help me.

Trying 1:

!pip install kaggle

import os
import kaggle

os.environ['KAGGLE_USERNAME'] = 'xxxxxx'
os.environ['KAGGLE_KEY'] = 'xxxxx'

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020')

Trying 1 error:
Error trying 1

Trying 2:

import os
import kaggle

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)

Trying 2 error:
Error trying 2

My kaggle.json credential in Databricks:
kaggle credential

I try two types of connections but its missing something or my credentials are wrong because the error is "Reason: Unauthorized".

答案1

得分: 1

主要问题在于您试图直接引用DBFS上的文件 /Filestore/...,但Kaggle API 不知道该文件系统,因为它使用Python的本地文件API。您有两个选择:

  • 直接在文件名前加上 /dbfs 引用DBFS路径,如 /dbfs/FileStore/tables/Kaggle_token/kaggle-2.json,而不是 /FileStore/tables/Kaggle_token/kaggle-2.json

  • 使用DBUtils命令将文件复制到本地磁盘或将下载的数据上传到DBFS。例如,要将文件复制到本地磁盘,您可以使用以下方法(注意不要使用 file: 前缀):

import os
conf_dir = "/tmp/kaggle-conf"
os.mkdir(conf_dir)
dbutils.fs.cp('/FileStore/tables/Kaggle_token/kaggle-2.json', 
  f'file:{conf_dir}/kaggle.json')
os.environ['KAGGLE_CONFIG_DIR'] = conf_dir
英文:

The main problem here is that you're trying to reference file on DBFS directly /Filestore/..., but Kaggle API doesn't know anything about that filesystem because it uses Python's local file API. You have two choices:

  • Refer to DBFS path directly by prepending the /dbfs to file names. Like, /dbfs/FileStore/tables/Kaggle_token/kaggle-2.json instead of /FileStore/tables/Kaggle_token/kaggle-2.json

  • Use DBUtils commands to copy files to local disks or upload downloaded data to DBFS. For example, to copy file to local disk you can use following (not the file: prefix):

import os
conf_dir = "/tmp/kaggle-conf"
os.mkdir(conf_dir)
dbutils.fs.cp('/FileStore/tables/Kaggle_token/kaggle-2.json', 
  f'file:{conf_dir}/kaggle.json')
os.environ['KAGGLE_CONFIG_DIR'] = conf_dir

答案2

得分: 0

Here is the translated code part:

尝试以下操作

```python
import os
import kaggle

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
api.authenticate()

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020', unzip=True)

第一段代码似乎缺少环境变量,而第二段代码没有调用api.authenticate()。在这个方法内部,有一个负责获取这些密钥的read_config_environment方法。


[![api中的Authenticate方法][1]][1]

[![输入图像说明][2]][2]


  [1]: https://i.stack.imgur.com/RDJeC.png
  [2]: https://i.stack.imgur.com/wLFBG.png

请注意,这只是代码的翻译部分。

英文:

Try the following:

import os
import kaggle

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
api.authenticate()

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)

It seems that in your first snippet there is missing the environment variables and in the second one there is no api.authenticate() being called. Inside this method there is the read_config_environment method that is responsible to get those keys.

从Kaggle使用Kaggle的API导入数据集到Databricks。

从Kaggle使用Kaggle的API导入数据集到Databricks。

huangapple
  • 本文由 发表于 2023年6月1日 04:55:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377210.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定