Pandas pd.DataFrame from loop-data

huangapple go评论81阅读模式
英文:

Pandas pd.DataFrame from loop-data

问题

import pandas as pd

cat = ['a', 'b']
TorF = [True, True, True, False, False, True, False, False, True, True]

data = [cat] + [TorF[i:i + len(TorF) // len(cat)] for i in range(0, len(TorF), len(TorF) // len(cat))]
df = pd.DataFrame(data).T
df.columns = [0] + [i for i in range(1, len(cat) + 1)]

df
   0     1      2      3      4      5
0  a  True   True  False  False   True
1  b  True  False   True  False   True
英文:

I am new to Python. I have some data that I get from a loop. cat and be between two and n and TorF will always be (cat*5) or (cat*4). My gold is to create a pd.DataFrame from two lists, like this

cat = ['a', 'b'] 
TorF = [True, True, True, False, False, True, False, False, True, True]

I think my current solution is kind of clumpy with the int((len(man_corr_n)/len(cat))),

import pandas as pd 
data = [[c, *TorF[i:i+int((len(TorF )/len(cat)))]] for i, c in enumerate(cat)]
df = pd.DataFrame(data)

if there a simpler way to do it?

My desired output is

   0     1     2      3      4      5
0  a  True  True   False  False   True
1  b  True  False  True   False   True

答案1

得分: 2

获取两个形状的比率是一个不错的策略。

然而,我会使用 sliding_window_view 函数:

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view as swv

cat = ['a', 'b']
man_corr_n = [True, True, True, False, False, True, False, False, True, True]

df = pd.DataFrame(swv(man_corr_n, len(man_corr_n)//len(cat))[:len(cat)],
                  index=cat).reset_index()

或者:

view = swv(man_corr_n, len(man_corr_n)//len(cat))[:len(cat)]
df = pd.DataFrame(np.hstack([np.array(cat)[:,None], view]))

输出:

   0     1     2      3      4      5
0  a  True  True   True  False  False
1  b  True  True  False  False   True

编辑:输出不明确

您提供的代码和所显示的期望输出明显不一致。使用明确的输入 (TorF = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),您的代码会产生以下结果:

   0  1  2  3  4  5
0  a  0  1  2  3  4
1  b  1  2  3  4  5

而您的期望输出似乎是:

   0  1  2  3  4  5
0  a  0  2  4  6  8
1  b  1  3  5  7  9

在这种情况下,您只需要进行 reshape 操作:

df = pd.DataFrame(np.reshape(np.r_[cat, TorF], (len(cat), -1), order='F'))

# 或者
df = pd.DataFrame(np.hstack([list(map(list, cat)), np.reshape(TorF, (len(cat), -1), order='F')]))

输出:

   0     1      2      3      4     5
0  a  True   True  False  False  True
1  b  True  False   True  False  True
英文:

Getting the ratio of the two shapes is a good strategy.

I would however use sliding_window_view:

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view as swv

cat = ['a', 'b']
man_corr_n = [True, True, True, False, False, True, False, False, True, True]

df = pd.DataFrame(swv(man_corr_n, len(man_corr_n)//len(cat))[:len(cat)],
                  index=cat).reset_index()

Or:

view = swv(man_corr_n, len(man_corr_n)//len(cat))[:len(cat)]
df = pd.DataFrame(np.hstack([np.array(cat)[:,None], view]))

Output:

   0     1     2      3      4      5
0  a  True  True   True  False  False
1  b  True  True  False  False   True

edit: ambiguous output

Your provided code and the shown expected output clearly conflict. Using an unambiguous input (TorF = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), your code gives:

   0  1  2  3  4  5
0  a  0  1  2  3  4
1  b  1  2  3  4  5

While your expected output seems to be:

   0  1  2  3  4  5
0  a  0  2  4  6  8
1  b  1  3  5  7  9

In such case, you just need to reshape:

df = pd.DataFrame(np.reshape(np.r_[cat, TorF], (len(cat), -1), order='F'))

# or
df = pd.DataFrame(np.hstack([list(map(list, cat)), np.reshape(TorF, (len(cat), -1), order='F')]))

Output:

   0     1      2      3      4     5
0  a  True   True  False  False  True
1  b  True  False   True  False  True

答案2

得分: 2

你可以从这两个列表创建一个Numpy数组,然后进行reshape、transpose操作,最后创建一个DataFrame:

import numpy as np
import pandas as pd

cat = ['a', 'b']
TorF = [True, True, True, False, False, True, False, False, True, True]
TorF = np.array(cat + TorF)
TorF2 = TorF.reshape(len(TorF)//2, 2)
df = pd.DataFrame(TorF2.T)

结果如下:

       0     1      2      3      4     5
0  a  True   True  False  False  True
1  b  True  False   True  False  True
英文:

You could form a Numpy array from the two Lists, reshape, transpose and form the DataFrame:

import numpy as np
import pandas as pd

cat = ['a', 'b']
TorF = [True, True, True, False, False, True, False, False, True, True]
TorF = np.array(cat + TorF)
TorF2 = TorF.reshape(len(TorF)//2, 2)
df = pd.DataFrame(TorF2.T)

giving:

   0     1      2      3      4     5
0  a  True   True  False  False  True
1  b  True  False   True  False  True

答案3

得分: 1

0 a True True False False True
1 b True False True False True

英文:
pd.DataFrame({'cat':cat*int(len(TorF)/2),'TorF':TorF})\
    .assign(col1=lambda dd:dd.index//2)\
    .set_index(['col1','cat'])\
    .unstack().T\
    .reset_index(level=1).to_numpy()

Output:

   0     1      2      3      4     5
0  a  True   True  False  False  True
1  b  True  False   True  False  True

huangapple
  • 本文由 发表于 2023年7月13日 20:12:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76679241.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定