2023年1月6日 10:38:28go评论170阅读模式

英文:

How to build a Pandas Dataframe with a Numpy Array from Imported CSV data with multiple numbers

问题

我有点困惑。我创建了一个概念验证，其中我使用静态的Numpy数组构建了一个Pandas数据帧。我成功地使其工作，但现在我要更进一步，导入一个CSV文件来构建相同的数据帧和Numpy数组。以下是文件的一部分和我写的内容。我想提取“numbers”列的第二列，并构建每行6个数字的数组。例如，[[11],[21],[27],[36],[62],[24]],[[14],[18],[36],[49],[67],[18]]等。

CSV：

日期，numbers，multiplier
09/26/2020，11 21 27 36 62 24，3
09/30/2020，14 18 36 49 67 18，2
10/03/2020，18 31 36 43 47 20，2

代码：

data = pd.read_csv('pbhistory.csv')
data['date'] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by='date', ascending=True, inplace=True)
df = pd.DataFrame(data.numbers.str.split().tolist(), columns=['1', '2', '3', '4', '5', '6']).astype(int)
print(df.head())

错误：
我希望从df2中得到6列数据，因为我认为在从CSV导入“numbers”列后，它已被正确转换为数组，但我得到了以下错误：

ValueError: 传递值的形状为(1414, 1)，索引暗示形状为(1414, 6)

所以，我将代码更改为df2 = pd.DataFrame(df, columns=['1'])，并获得以下输出。问题是，我需要它有6列，而不是1列。

1
0 11 21 27 36 62 24
1 14 18 36 49 67 18
2 18 31 36 43 47 20

所以，正如你所看到的，我只得到了一列包含所有数字的数据，而不是包含6列数字的数组。

英文:

I'm a little stumped on this one. I've created a proof of concept where I built a Pandas Dataframe with a static Numpy Array of numbers. I got this working fine, but now I'm taking it a step further and importing a CSV file to build this same Dataframe and Numpy Array. Here is the snippet of the file and what I've written. I want to take the second column of 'numbers' and build an array of 6 numbers per line. For example, [[11],[21],[27],[36],[62],[24]], [[14],[18],[36],[49],[67],[18]], etc.

CSV:

date,numbers,multiplier
09/26/2020,11 21 27 36 62 24,3
09/30/2020,14 18 36 49 67 18,2
10/03/2020,18 31 36 43 47 20,2

CODE:

data = pd.read_csv(&#39;pbhistory.csv&#39;)
data[&#39;date&#39;] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by=&#39;date&#39;, ascending=True, inplace=True)
df = pd.DataFrame(data.numbers).to_numpy()
df2 = pd.DataFrame(df, columns=[&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;6&#39;])
print(df2.head())

ERROR:
I'm expecting 6 columns of data from df2 as I thought it was converted to an array properly after importing the 'numbers' column from the CSV, but I get the following:

ValueError: Shape of passed values is (1414, 1), indices imply (1414, 6)

So, I change the code to df2 = pd.DataFrame(df, columns=['1']) and get the following output. The problem is, I need it to be in 6 columns, not 1.

                   1
0  11 21 27 36 62 24
1  14 18 36 49 67 18
2  18 31 36 43 47 20

So, as you can see, I'm only getting one column with all numbers, instead of an array of numbers with 6 columns.

答案1

得分: 1

data = pd.read_csv('pbhistory.csv')
data['date'] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by='date', ascending=True, inplace=True)
df = pd.DataFrame(data.numbers).to_numpy()
然后首先拆分它
df2 = df['numbers'].str.split(' ', expand=True)

英文:

data = pd.read_csv(&#39;pbhistory.csv&#39;)
data[&#39;date&#39;] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by=&#39;date&#39;, ascending=True, inplace=True)
df = pd.DataFrame(data.numbers).to_numpy()

Then split it first

df2 = df[&#39;numbers&#39;].str.split(&#39; &#39;, expand=True)

答案2

得分: 0

CSV代表逗号分隔值，即它将两个逗号之间的所有内容视为一个输入。如果您希望数字分开，您必须在它们之间加上逗号，否则您将不得不解析6个非逗号分隔值的较长文本并重建数据框架。

英文:

Remember that CSV stands for Comma Separated Values, ie it reads everything between two commas as one input. If you want the numbers separated you have to put commas between them, otherwise you'll have to parse the longer text of 6 non-comma separated values and rebuild the dataframe.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to build a Pandas Dataframe with a Numpy Array from Imported CSV data with multiple numbers

问题

答案1

答案2

使用Peewee创建自引用表格

Installing python packages in Mac.

修复包含字典的Python代码。

如何使用pywikibot处理自定义Wikibase？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。