尝试读取CSV文件时,但扩展名似乎不可见。

huangapple go评论64阅读模式
英文:

problem trying to read a csv file but the extension does not appear

问题

我正在尝试打开来自Spotify数据库的CSV文件,以处理其Sequential Skip Predictions数据集。但是,它的列和行存在一些问题,我无法弄清楚如何修复。我最多只能打开表格,忽略一些问题,但仍然存在很多错误。

链接如下:

Spotify连续跳过预测挑战赛链接

我试图访问的文件是:

Training_Set_And_Track_Features_Mini(17.2 MB)

我使用了以下代码,这是我设法打开它的内容:

import pandas as pd

# CSV文件路径
path_file = '/content/drive/MyDrive/TESTE TCC/training mini/16772e7f-7871-4d42-a44f-5f399f40fd94_training_set_track_features_mini'

# 打开CSV
data = pd.read_csv(file_path, encoding='latin1', error_bad_lines=False)

# 显示数据
data

但即便如此,它仍然存在许多错误,列也都很奇怪。

英文:

Well, I'm trying to open a CSV file from the Spotify Database, to work with its Sequential Skip Predictions dataset. But he has some problem with his columns and rows that I can't figure out how to fix. The most I could do was open the table ignoring some problems but it still gets very buggy.

The link is this:

Link for Spotify Sequential Skip Prediction Challange

The file I'm trying to access is this:

Training_Set_And_Track_Features_Mini (17.2 MB)

I used this code here, which is what I managed to do to open it:

import pandas as pd

# path csv
path_file = '/content/drive/MyDrive/TESTE TCC/training mini/16772e7f-7871-4d42-a44f-5f399f40fd94_training_set_track_features_mini'

# Open CSV
data = pd.read_csv(file_path, encoding='latin1', error_bad_lines=False)

# display the data
data

but even so, it's still very buggy, and the columns are all weird

答案1

得分: 2

这是一个tar.gz档案,在使用之前需要解压缩:

为此,请运行以下Shell命令:

tar -xzf 16772e7f-7871-4d42-a44f-5f399f40fd94_training_set_track_features_mini.tar.tar

这将创建一个包含CSV文件的data文件夹。

然后,从您的Python脚本/笔记本中执行以下操作:

df_features = pd.read_csv('data/track_features/tf_mini.csv')

df_train = pd.read_csv('data/training_set/log_mini.csv')
英文:

This is a tar.gz archive, which you need to extract before using:

For this, run this shell command:

tar -xzf 16772e7f-7871-4d42-a44f-5f399f40fd94_training_set_track_features_mini.tar.tar

This will create a data folder that contains the CSV files.

Then, from your python script/notebook:

df_features = pd.read_csv('data/track_features/tf_mini.csv')

df_train = pd.read_csv('data/training_set/log_mini.csv')

huangapple
  • 本文由 发表于 2023年6月1日 19:16:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381312.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定