2023年2月6日 14:02:38go评论99阅读模式

英文:

make 3d numpy array using for loop in python

问题

我有2维的训练数据（包含4个特征的200个结果）。

我进行了100个不同的应用程序测试，每个应用程序重复10次，得到了1000个CSV文件。

我想要将每个CSV文件的结果叠加以用于机器学习，但我不知道如何做。

我的每个CSV文件看起来像下面这样：

test1.csv 转换成numpy数组数据

[[0 'crc32_pclmul' 445 0]
 [0 'crc32_pclmul' 270 4096]
 [0 'crc32_pclmul' 234 8192]
 ...
 [249 'intel_pmt' 272 4096]
 [249 'intel_pmt' 224 8192]
 [249 'intel_pmt' 268 12288]]

我尝试了以下Python代码：

path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
cnt = 0
for f in csv_files:
    cnt += 1
    seperator = '_'
    app = os.path.basename(f).split(seperator, 1)[0]
    if cnt == 1:
        a = np.array(preprocess(f))
        b = np.array(app)
    else:
        a = np.vstack((a, np.array(preprocess(f)))
        b = np.append(b, app)
print(a)
print(b)

preprocess函数返回每个CSV文件的df.to_numpy结果。

我的期望是这样的：a(1000, 200, 4)

[[[0 'crc32_pclmul' 445 0]
 [0 'crc32_pclmul' 270 4096]
 [0 'crc32_pclmul' 234 8192]
 ...
 [249 'intel_pmt' 272 4096]
 [249 'intel_pmt' 224 8192]
 [249 'intel_pmt' 268 12288]],
[[0 'crc32_pclmul' 445 0]
 [0 'crc32_pclmul' 270 4096]
 [0 'crc32_pclmul' 234 8192]
 ...
 [249 'intel_pmt' 272 4096]
 [249 'intel_pmt' 224 8192]
 [249 'intel_pmt' 268 12288]],
...
[[0 'crc32_pclmul' 445 0]
 [0 'crc32_pclmul' 270 4096]
 [0 'crc32_pclmul' 234 8192]
 ...
 [249 'intel_pmt' 272 4096]
 [249 'intel_pmt' 224 8192]
 [249 'intel_pmt' 268 12288]]]

然而，我得到的是这样的：a(200000, 4)

[[0 'crc32_pclmul' 445 0]
 [0 'crc32_pclmul' 270 4096]
 [0 'crc32_pclmul' 234 8192]
 ...
 [249 'intel_pmt' 272 4096]
 [249 'intel_pmt' 224 8192]
 [249 'intel_pmt' 268 12288]]

我想要通过a[0]到a[1000]访问每个CSV文件的结果，每个子数组都应该是(200, 4)。如何解决这个问题？我很困惑。

英文:

I have training data with 2 dimension. (200 results of 4 features)

I proved 100 different applications with 10 repetition resulting 1000 csv files.

I want to stack each csv results for machine learning.
But I don't know how.

each of my csv files look like below.

test1.csv to numpy array data

[[0 &#39;crc32_pclmul&#39; 445 0]
 [0 &#39;crc32_pclmul&#39; 270 4096]
 [0 &#39;crc32_pclmul&#39; 234 8192]
 ...
 [249 &#39;intel_pmt&#39; 272 4096]
 [249 &#39;intel_pmt&#39; 224 8192]
 [249 &#39;intel_pmt&#39; 268 12288]]

I tried below python code.

path = os.getcwd()
csv_files = glob.glob(os.path.join(path, &quot;*.csv&quot;))
cnt=0
for f in csv_files:
	cnt +=1
	seperator = &#39;_&#39;
	app = os.path.basename(f).split(seperator, 1)[0]
	if cnt==1:
		a = np.array(preprocess(f))
		b = np.array(app)
	else:
		a = np.vstack((a, np.array(preprocess(f))))
		b = np.append(b,app)
print(a)
print(b)

preprocess function returns df.to_numpy results for each csv files.

My expectation was like below. a(1000, 200, 4)

[[[0 &#39;crc32_pclmul&#39; 445 0]
 [0 &#39;crc32_pclmul&#39; 270 4096]
 [0 &#39;crc32_pclmul&#39; 234 8192]
 ...
 [249 &#39;intel_pmt&#39; 272 4096]
 [249 &#39;intel_pmt&#39; 224 8192]
 [249 &#39;intel_pmt&#39; 268 12288]],
[[0 &#39;crc32_pclmul&#39; 445 0]
 [0 &#39;crc32_pclmul&#39; 270 4096]
 [0 &#39;crc32_pclmul&#39; 234 8192]
 ...
 [249 &#39;intel_pmt&#39; 272 4096]
 [249 &#39;intel_pmt&#39; 224 8192]
 [249 &#39;intel_pmt&#39; 268 12288]],
...
[[0 &#39;crc32_pclmul&#39; 445 0]
 [0 &#39;crc32_pclmul&#39; 270 4096]
 [0 &#39;crc32_pclmul&#39; 234 8192]
 ...
 [249 &#39;intel_pmt&#39; 272 4096]
 [249 &#39;intel_pmt&#39; 224 8192]
 [249 &#39;intel_pmt&#39; 268 12288]]]

However, I'm getting this. a(200000, 4)

[[0 &#39;crc32_pclmul&#39; 445 0]
 [0 &#39;crc32_pclmul&#39; 270 4096]
 [0 &#39;crc32_pclmul&#39; 234 8192]
 ...
 [249 &#39;intel_pmt&#39; 272 4096]
 [249 &#39;intel_pmt&#39; 224 8192]
 [249 &#39;intel_pmt&#39; 268 12288]]

I want to access each csv results using a[0] to a[1000] each sub-array looks like (200,4)
How can I solve the problem? I'm quite lost

答案1

得分: 0

Make a new list (outside of the loop) and append each item to that new list after reading.

英文:

Make a new list (outside of the loop) and append each item to that new list after reading.

答案2

得分: 0

你必须从 vstack 更改为 stack

la = []
lb = []
for f in csv_files:
    cnt += 1
    seperator = '_'
    app = os.path.basename(f).split(seperator, 1)[0]
    la.append(preprocess(f))
    lb.append(app)
a = np.stack(la, axis=0)
b = np.array(lb)

vstack 只能沿着行堆叠，而 stack 函数可以沿着新轴堆叠。

英文:

You have to change from vstack to stack

la=[]
lb=[]
for f in csv_files:
    cnt +=1
    seperator = &#39;_&#39;
    app = os.path.basename(f).split(seperator, 1)[0]
    la.append(preprocess(f))
    lb.append(app)
a=np.stack(la, axis=0)
b=np.array(lb)

vstack can stack along rows only but stack function can stack along a new axis.

答案3

得分: 0

是的，vstack（以及append）就是这样做的。它将在相同的轴（行轴）上合并数据。

a1=np.arange(10).reshape(2,5)
# [[0,1,2,3,4],
#  [5,6,7,8,9]]
a2=np.arange(10,20).reshape(2,5)
# [[10, 11, 12, 13, 14],
#  [15, 16, 17, 18, 19]])
np.vstack((a1,a2))
# [[ 0,  1,  2,  3,  4],
#  [ 5,  6,  7,  8,  9],
#  [10, 11, 12, 13, 14],
#  [15, 16, 17, 18, 19]])
b1=np.arange(5)
b2=np.arange(5,10)
np.append(b1,b2)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

如果你期望（从这些示例中）沿着一个新的轴进行附加，那么你需要添加它，或者使用更灵活的stack。

np.vstack(([a1],[a2]))
#array([[[ 0,  1,  2,  3,  4],
#       [ 5,  6,  7,  8,  9]],
#
#      [[10, 11, 12, 13, 14],
#       [15, 16, 17, 18, 19]]])

或者，在一维情况下，使用vstack而不是append。

np.vstack((b1,b2))
#array([[0, 1, 2, 3, 4],
#       [5, 6, 7, 8, 9]])

但更重要的是，在循环内部不应该这样做。每个这些函数（stack，vstack，append）都会重新创建一个新的数组。

更高效的方法可能是将所有的np.array(preprocess(f))和b = np.array(app)直接附加到一个纯Python列表中，然后仅在读取它们全部后才调用stack和vstack。

或者，更好的方法是直接在Python列表中直接附加preprocess(f)和app，并在循环之后才调用np.array，将它们全部组合起来。

所以，类似这样：

la=[]
lb=[]
for f in csv_files:
    cnt +=1
    seperator = '_'
    app = os.path.basename(f).split(seperator, 1)[0]
    la.append(preprocess(f))
    lb.append(app)
a=np.array(la)
b=np.array(lb)

英文:

Well, yes, that is what vstack (and append) does. It merges things on the same axis (rows axis).

a1=np.arange(10).reshape(2,5)
# [[0,1,2,3,4],
#  [5,6,7,8,9]]
a2=np.arange(10,20).reshape(2,5)
# [[10, 11, 12, 13, 14],
#  [15, 16, 17, 18, 19]])
np.vstack((a1,a2))
# [[ 0,  1,  2,  3,  4],
#  [ 5,  6,  7,  8,  9],
#  [10, 11, 12, 13, 14],
#  [15, 16, 17, 18, 19]])
b1=np.arange(5)
b2=np.arange(5,10)
np.append(b1,b2)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If you expect (from those examples), to append along a new axis, then you need to add it, or to use more flexible stack.

np.vstack(([a1],[a2]))
#array([[[ 0,  1,  2,  3,  4],
#       [ 5,  6,  7,  8,  9]],
#
#      [[10, 11, 12, 13, 14],
#       [15, 16, 17, 18, 19]]])

Or, in the case of 1d, use vstack instead of append

np.vstack((b1,b2))
#array([[0, 1, 2, 3, 4],
#       [5, 6, 7, 8, 9]])

But more importantly, you shouldn't be doing this in the first place inside a loop. Each of those functions (stack, vstack, append) recreates a new array.

It would be probably more efficient to just append all your np.array(preprocess(f)) and b = np.array(app) to a pure python list, and call stack and vstack only once you've read them all.

Or, even better, just append directly the preprocess(f) and the app inside python list. And call np.array only after the loop, and the whole thing.

So, something like

la=[]
lb=[]
for f in csv_files:
    cnt +=1
    seperator = &#39;_&#39;
    app = os.path.basename(f).split(seperator, 1)[0]
    la.append(preprocess(f))
    lb.append(app)
a=np.array(la)
b=np.array(lb)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Python中的for循环创建3D NumPy数组。

问题

答案1

答案2

答案3

numpy的逻辑逐元素操作在pandas 2.0中是否出现问题？（np.logical_or）

无法导入langchain.agents.load_tools

pip下载镜像在requirements.txt或venv中

For a Closed-source Python Library, do you need a setup.py file or is that just for PyPi publication? This package will only be used amongst my team

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论