英文:
Import dataset from Kaggle to Databricks with Kaggle's API
问题
尝试1错误的原因是 "未经授权",这可能是因为凭据配置不正确。尝试2也有相同的问题。你可以检查以下几点:
-
确保你的 Kaggle 用户名和密钥 (
KAGGLE_USERNAME
和KAGGLE_KEY
) 是正确的。 -
确保你在 Databricks 中正确配置了
kaggle.json
凭据文件,以便你的代码可以找到它。 -
请确保你的 Kaggle 账户有权限访问
taricov/mobile-wallets-in-egypt-2020
数据集。 -
如果你在 Databricks 上使用 Python,确保环境中已经安装了
kaggle
包。
如果你已经检查了以上内容但仍然遇到问题,可能需要重新配置凭据或者联系 Kaggle 支持以获取更多帮助。
英文:
Here trying to import a dataset from **Kaggle **to **DataBricks **(community) with their Kaggle' API. But I'm falling and lost 3 days. Please a kind soul can help me.
Trying 1:
!pip install kaggle
import os
import kaggle
os.environ['KAGGLE_USERNAME'] = 'xxxxxx'
os.environ['KAGGLE_KEY'] = 'xxxxx'
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020')
Trying 1 error:
Error trying 1
Trying 2:
import os
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)
Trying 2 error:
Error trying 2
My kaggle.json credential in Databricks:
kaggle credential
I try two types of connections but its missing something or my credentials are wrong because the error is "Reason: Unauthorized".
答案1
得分: 1
主要问题在于您试图直接引用DBFS上的文件 /Filestore/...
,但Kaggle API 不知道该文件系统,因为它使用Python的本地文件API。您有两个选择:
-
直接在文件名前加上
/dbfs
引用DBFS路径,如/dbfs/FileStore/tables/Kaggle_token/kaggle-2.json
,而不是/FileStore/tables/Kaggle_token/kaggle-2.json
-
使用DBUtils命令将文件复制到本地磁盘或将下载的数据上传到DBFS。例如,要将文件复制到本地磁盘,您可以使用以下方法(注意不要使用
file:
前缀):
import os
conf_dir = "/tmp/kaggle-conf"
os.mkdir(conf_dir)
dbutils.fs.cp('/FileStore/tables/Kaggle_token/kaggle-2.json',
f'file:{conf_dir}/kaggle.json')
os.environ['KAGGLE_CONFIG_DIR'] = conf_dir
英文:
The main problem here is that you're trying to reference file on DBFS directly /Filestore/...
, but Kaggle API doesn't know anything about that filesystem because it uses Python's local file API. You have two choices:
-
Refer to DBFS path directly by prepending the
/dbfs
to file names. Like,/dbfs/FileStore/tables/Kaggle_token/kaggle-2.json
instead of/FileStore/tables/Kaggle_token/kaggle-2.json
-
Use DBUtils commands to copy files to local disks or upload downloaded data to DBFS. For example, to copy file to local disk you can use following (not the
file:
prefix):
import os
conf_dir = "/tmp/kaggle-conf"
os.mkdir(conf_dir)
dbutils.fs.cp('/FileStore/tables/Kaggle_token/kaggle-2.json',
f'file:{conf_dir}/kaggle.json')
os.environ['KAGGLE_CONFIG_DIR'] = conf_dir
答案2
得分: 0
Here is the translated code part:
尝试以下操作:
```python
import os
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
api.authenticate()
api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020', unzip=True)
第一段代码似乎缺少环境变量,而第二段代码没有调用api.authenticate()。在这个方法内部,有一个负责获取这些密钥的read_config_environment方法。
[![api中的Authenticate方法][1]][1]
[![输入图像说明][2]][2]
[1]: https://i.stack.imgur.com/RDJeC.png
[2]: https://i.stack.imgur.com/wLFBG.png
请注意,这只是代码的翻译部分。
英文:
Try the following:
import os
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
api.authenticate()
api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)
It seems that in your first snippet there is missing the environment variables and in the second one there is no api.authenticate() being called. Inside this method there is the read_config_environment method that is responsible to get those keys.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论