在pandas行中一个零序列前后的事件百分比[英] Percentage of events before and after a sequence of zeros in pandas rows

本文是小编为大家收集整理的关于在pandas行中一个零序列前后的事件百分比的处理方法,想解了在pandas行中一个零序列前后的事件百分比的问题怎么解决?在pandas行中一个零序列前后的事件百分比问题的解决办法?那么可以参考本文帮助大家快速定位并解决问题。

问题描述

我有一个如下的数据框:

        ID      0   1   2   3   4   5   6   7   8   ... 81  82  83  84  85  86  87  88  89  90  total
-----------------------------------------------------------------------------------------------------
0       A       2   21  0   18  3   0   0   0   2   ... 0   0   0   0   0   0   0   0   0   0    156
1       B       0   20  12  2   0   8   14  23  0   ... 0   0   0   0   0   0   0   0   0   0    231
2       C       0   38  19  3   1   3   3   7   1   ... 0   0   0   0   0   0   0   0   0   0     78
3       D       3   0   0   1   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0      5

我想知道在每行中出现第一个长度为 n 的零序列之前和之后的事件百分比(单元格中的数字).这个问题始于此处发现的另一个问题:pandas 数据帧中特定列之后给定大小的第一个零序列的长度,我正在尝试修改代码以执行我需要的操作,但我不断收到错误而且似乎找不到正确的方法.这是我尝试过的:

def func(row, n):
    """Returns the number of events before the 
    first sequence of 0s of length n is found
    """

    idx = np.arange(0, 91)

    a = row[idx]
    b = (a != 0).cumsum()
    c = b[a == 0]
    d = c.groupby(c).count()

    #in case there is no sequence of 0s with length n
    try:
        e = c[c >= d.index[d >= n][0]]
        f = str(e.index[0])
    except IndexError:
        e = [90]
        f = str(e[0])

    idx_sliced = np.arange(0, int(f)+1)
    a = row[idx_sliced]

    if (int(f) + n > 90):
        perc_before = 100
    else:
        perc_before = a.cumsum().tail(1).values[0]/row['total']

    return perc_before

按原样,我得到的错误是:

---> perc_before = a.cumsum().tail(1).values[0]/row['total']
TypeError: ('must be str, not int', 'occurred at index 0')

最后,我会将此函数应用于数据帧并返回一个新列,其中包含每行中第一个 n 0 序列之前的事件百分比,如下所示:

        ID      0   1   2   3   4   5   6   7   8   ... 81  82  83  84  85  86  87  88  89  90  total  %_before
---------------------------------------------------------------------------------------------------------------
0       A       2   21  0   18  3   0   0   0   2   ... 0   0   0   0   0   0   0   0   0   0    156   43
1       B       0   20  12  2   0   8   14  23  0   ... 0   0   0   0   0   0   0   0   0   0    231   21
2       C       0   38  19  3   1   3   3   7   1   ... 0   0   0   0   0   0   0   0   0   0     78   90
3       D       3   0   0   1   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0      5   100

如果尝试解决此问题,您可以使用此示例输入进行测试:

a = pd.Series([1,1,13,0,0,0,4,0,0,0,0,0,12,1,1])
b = pd.Series([1,1,13,0,0,0,4,12,1,12,3,0,0,5,1])
c = pd.Series([1,1,13,0,0,0,4,12,2,0,5,0,5,1,1])
d = pd.Series([1,1,13,0,0,0,4,12,1,12,4,50,0,0,1])
e = pd.Series([1,1,13,0,0,0,4,12,0,0,0,54,0,1,1])

df = pd.DataFrame({'0':a, '1':b, '2':c, '3':d, '4':e})
df = df.transpose()

推荐答案

试试这个:

def percent_before(row, n, ncols):
    """Return the percentage of activities happen before
    the first sequence of at least `n` consecutive 0s
    """
    start_index, i, size = 0, 0, 0
    for i in range(ncols):
        if row[i] == 0:
            # increase the size of the island
            size += 1
        elif size >= n:
            # found the island we want
            break
        else:
            # start a new island
            # row[start_index] is always non-zero
            start_index = i
            size = 0

    if size < n:
        # didn't find the island we want
        return 1
    else:
        # get the sum of activities that happen
        # before the island
        idx = np.arange(0, start_index + 1).astype(str)
        return row.loc[idx].sum() / row['total']

df['percent_before'] = df.apply(percent_before, n=3, ncols=15, axis=1)

结果:

   0  1   2  3  4  5  6   7  8   9  10  11  12  13  14  total  percent_before
0  1  1  13  0  0  0  4   0  0   0   0   0  12   1   1     33        0.454545
1  1  1  13  0  0  0  4  12  1  12   3   0   0   5   1     53        0.283019
2  1  1  13  0  0  0  4  12  2   0   5   0   5   1   1     45        0.333333
3  1  1  13  0  0  0  4  12  1  12   4  50   0   0   1     99        0.151515
4  1  1  13  0  0  0  4  12  0   0   0  54   0   1   1     87        0.172414

对于全帧,使用 ncols=91 调用 apply.

本文地址:https://www.itbaoku.cn/post/1728157.html