使用pandas对长数据集进行宽幅处理[英] Wide to long dataset using pandas

本文是小编为大家收集整理的关于使用pandas对长数据集进行宽幅处理的处理方法,想解了使用pandas对长数据集进行宽幅处理的问题怎么解决?使用pandas对长数据集进行宽幅处理问题的解决办法?那么可以参考本文帮助大家快速定位并解决问题。

问题描述

有很多类似标题的问题,但我无法解决我的数据集遇到的问题.

数据集:

ID   Country Type Region Gender IA01_Raw  IA01_Class1  IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   4         8            1            J         4            1           yes       N/A
SC2  France  A    Europe Female 2         7            2            Q         6            4           yes       N/A
SC3  France  B    Europe Male   3         7            2            K         8            2           yes       N/A
SC4  France  A    Europe Male   4         8            2            A         2            1           yes       N/A
SC5  France  B    Europe Male   1         7            1            F         1            3           yes       N/A
ID6  France  A    Europe Male   2         8            1            R         3            7           yes       N/A
ID7  France  B    Europe Male   2         8            1            Q         4            6           yes       N/A
UC8  France  B    Europe Male   4         8            2            P         4            2           yes       N/A

所需输出:

ID   Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   01 K   8      1      yes        N/A
SC1  France  A    Europe Male   01 L   8      1      yes       N/A
SC1  France  A    Europe Male   01 P   8      1      yes       N/A
SC1  France  A    Europe Male   02 Q   8      1      yes       N/A
SC1  France  A    Europe Male   02 R   8      1      yes       N/A
SC1  France  A    Europe Male   02 T   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   03 R   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   04 K   8      1      yes       N/A
SC1  France  A    Europe Male   04 A   8      1      yes       N/A
SC1  France  A    Europe Male   04 P   8      1      yes       N/A
SC1  France  A    Europe Male   05 R   8      1      yes       N/A
....
<小时>

在数据集中,我的列名称为 IA[X]_NAME 其中 X = 1..9 和 NAME = Raw, Class1 和 Class2.

我要做的只是转置这些列,使其看起来像必需输出中显示的表格,即 IA 将显示 X 值,就像这样 raw 和 classes 将显示它们的透视值.

所以为了实现它,我将列切片为:

idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns

我不知道这一步是否必要,但这给了我完美的列切片,但是当我把它放在 melt 中时,它不能正常工作.我几乎尝试了其他问题中可用的所有方法.

pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)

我也试过这个:

excel_df_final.set_index(idVars)[41:472].unstack()

但是没有用,这里是从宽到长的实现,它也没有用:

pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)

我得到的从宽到长的错误是:

<块引用>

ValueError: 操作数不能与形状一起广播 (95,)(431,)

因为我的数据集实际上有 526 列,所以我将它们分成两个列表,其中一个包含 95 个列名,它们将是 i,其余 431 个是我需要在如示例数据集中所示的行.

推荐答案

这将帮助您入门.本质是使用set_index,列转换为MultiIndex,然后stack.可能存在更好的解决方案,但我会这样做,因为它是您输出的一个简单步骤.

# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
   'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out =  df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)

out

     ID Country Type  Region  Gender QA_Include  QA_Comments    IA  Class1  Class2 Raw
0   SC1  France    A  Europe    Male        yes          NaN  IA01       8       1   4
1   SC1  France    A  Europe    Male        yes          NaN  IA02       4       1   J
2   SC2  France    A  Europe  Female        yes          NaN  IA01       7       2   2
3   SC2  France    A  Europe  Female        yes          NaN  IA02       6       4   Q
4   SC3  France    B  Europe    Male        yes          NaN  IA01       7       2   3
5   SC3  France    B  Europe    Male        yes          NaN  IA02       8       2   K
6   SC4  France    A  Europe    Male        yes          NaN  IA01       8       2   4
7   SC4  France    A  Europe    Male        yes          NaN  IA02       2       1   A
8   SC5  France    B  Europe    Male        yes          NaN  IA01       7       1   1
9   SC5  France    B  Europe    Male        yes          NaN  IA02       1       3   F
10  ID6  France    A  Europe    Male        yes          NaN  IA01       8       1   2
11  ID6  France    A  Europe    Male        yes          NaN  IA02       3       7   R
12  ID7  France    B  Europe    Male        yes          NaN  IA01       8       1   2
13  ID7  France    B  Europe    Male        yes          NaN  IA02       4       6   Q
14  UC8  France    B  Europe    Male        yes          NaN  IA01       8       2   4
15  UC8  France    B  Europe    Male        yes          NaN  IA02       4       2   P

本文地址:https://www.itbaoku.cn/post/1728163.html