需要根据数据框架中的行数应用不同的公式[英] Need to apply different formulas based on the row number in the dataframe

本文是小编为大家收集整理的关于需要根据数据框架中的行数应用不同的公式的处理方法,想解了需要根据数据框架中的行数应用不同的公式的问题怎么解决?需要根据数据框架中的行数应用不同的公式问题的解决办法?那么可以参考本文帮助大家快速定位并解决问题。

问题描述

我正在努力在数据框架中找到某种移动平均值.公式将根据所计算的行数进行更改.实际情况是我需要计算Z.

的地方

edit-2:

以下是我正在使用

的实际数据
          Date     Open     High      Low    Close
0   01-01-2018  1763.95  1763.95  1725.00  1731.35
1   02-01-2018  1736.20  1745.80  1725.00  1743.20
2   03-01-2018  1741.10  1780.00  1740.10  1774.60
3   04-01-2018  1779.95  1808.00  1770.00  1801.35
4   05-01-2018  1801.10  1820.40  1795.60  1809.95
5   08-01-2018  1816.00  1827.95  1800.00  1825.00
6   09-01-2018  1823.00  1835.00  1793.90  1812.05
7   10-01-2018  1812.05  1823.00  1801.40  1816.55
8   11-01-2018  1825.00  1825.05  1798.55  1802.10
9   12-01-2018  1805.00  1820.00  1794.00  1804.95
10  15-01-2018  1809.90  1834.45  1792.45  1830.00
11  16-01-2018  1835.00  1857.45  1826.10  1850.25
12  17-01-2018  1850.00  1852.45  1826.20  1840.50
13  18-01-2018  1840.50  1852.00  1823.50  1839.00
14  19-01-2018  1828.25  1836.35  1811.00  1829.50
15  22-01-2018  1816.50  1832.55  1805.50  1827.20
16  23-01-2018  1825.00  1825.00  1782.25  1790.15
17  24-01-2018  1787.80  1792.70  1732.15  1737.50
18  25-01-2018  1739.90  1753.40  1720.00  1726.40
19  29-01-2018  1735.15  1754.95  1729.80  1738.70

我正在使用的代码段如下:

from datetime import date
from nsepy import get_history
import csv
import pandas as pd
import numpy as np
import requests
from datetime import timedelta
import datetime as dt
import pandas_datareader.data as web
import io

df = pd.read_csv('ACC.CSV')

idx = df.reset_index().index

df['Change'] = df['Close'].diff()
df['Advance'] = np.where(df.Change > 0, df.Change,0)
df['Decline'] = np.where(df.Change < 0, df.Change*-1, 0)
conditions = [idx < 14, idx == 14, idx > 14]
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
df['Avg_Loss'] = (df.Decline.rolling(14).sum())/14
df['RS'] = df.Avg_Gain / df.Avg_Loss
df['RSI'] = np.where(df['Avg_Loss'] == 0, 100, 100-(100/(1+df.RS)))
df.drop(['Change', 'Advance', 'Decline', 'Avg_Gain', 'Avg_Loss', 'RS'],     axis=1)

print(df.head(20))

以下是我遇到的错误:

Traceback (most recent call last):
  File "C:/Users/Lenovo/Desktop/Python/0.Chart Patterns/Z.Sample Code.py", line 20, in <module>
    values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Avg_Gain'

编辑3:以下是预期输出,然后我还将写下公式.原始DF由列组成,日期开放高低和关闭.

在此处输入图像说明

Advance and Decline – 
•   If the difference between current & previous is +ve then Advance = difference and Decline = 0
•   If the difference between current & previous is –ve then Advance = 0 and     Decline = -1 * difference
Avg_Gain:
•   If index is < 13 then Avg_Gain = 0
•   If index = 13 then Avg_Gain = Average of 14 periods
•   If index > 13, then Avg_Gain = (Avg_Gain(previous-row) * 13 +     Advance(current-row) )/14
Avg_Loss:
•   If index is < 13 then Avg_Loss = 0
•   If index = 13 then Avg_Loss = Average of Advance of 14 periods
•   If index > 13, then Avg_Loss = (Avg_Loss(previous-row) * 13 + Decline(current-row) )/14
RS:
•   If index < 13 then RS = 0
•   If Index >= 13 then RS = Avg_Gain/Avg_Loss
RSI = 100-(100/(1 + RS))

我希望这会有所帮助.

推荐答案

您在代码中有错误,因为您在创建df.Avg_Gain中使用df.Avg_Gain. y

values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)

我将代码的那一部分更改为以下:

up = df.Advance.rolling(14).sum()/14
values = [0, up, (up.shift(1) * 13 + df.Advance)/14]

输出(IDX> = 14):

    Date        Open    High    Low     Close   RSI
14  2018-01-19  1828.25 1836.35 1811.00 1829.50 75.237850
15  2018-01-22  1816.50 1832.55 1805.50 1827.20 72.920021
16  2018-01-23  1825.00 1825.00 1782.25 1790.15 58.793750
17  2018-01-24  1787.80 1792.70 1732.15 1737.50 40.573938
18  2018-01-25  1739.90 1753.40 1720.00 1726.40 31.900045
19  2018-01-29  1735.15 1754.95 1729.80 1738.70 33.197678

不过应该​​有一种更好的方法来做到这一点.如果找到一个,我将使用更好的解决方案进行更新.让我知道此数据是否正确.

更新: 您还需要纠正" avg_loss" ::

的计算
down = df.Decline.rolling(14).sum()/14
down_values = [0, down, (down.shift(1) * 13 + df.Decline)/14]
df['Avg_Loss'] = np.select(conditions, down_values)

.php?id = Chart_school:technology_indicators:ferver_strength_index_rsi#计算

更新2 :提供了预期数据后. 因此,我唯一可以做到这一点的方法是循环 - 不确定如果可能的话,也许我不知道某些PANDAS功能.

因此,首先要与以前进行Avg_Gain和Avg_Loss: 您只需要稍微更改值:

conditions = [idx<13, idx==13, idx>13]
up = df.Advance.rolling(14).sum()/14
values = [0, up, 0]
df['Avg_Gain'] = np.select(conditions, values)

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, 0]
df['Avg_Loss'] = np.select(conditions, d_values)

我已将您的条件更改为在索引13上拆分 - 因为这是我根据预期输出所看到的.

运行此代码后,您将使用Agv_Gain和Avg_Loss的先前值填充Avg_Gain和Avg_Loss的值

p=14
for i in range(p, len(df)):
    df.at[i, 'Avg_Gain'] = ((df.loc[i-1, 'Avg_Gain'] * (p-1)) + df.loc[i, 'Advance']) / p
    df.at[i, 'Avg_Loss'] = ((df.loc[i-1, 'Avg_Loss'] * (p-1)) + df.loc[i, 'Decline']) / p

输出:

df[13:][['Date','Avg_Gain', 'Avg_Loss', 'RS', 'RSI']]

    Date        Avg_Gain    Avg_Loss    RS          RSI
13  2018-01-18  10.450000   2.760714    3.785252    79.102460
14  2018-01-19  9.703571    3.242092    2.992997    74.956155
15  2018-01-22  9.010459    3.174800    2.838119    73.945571
16  2018-01-23  8.366855    5.594457    1.495562    59.928860
17  2018-01-24  7.769222    8.955567    0.867530    46.453335
18  2018-01-25  7.214278    9.108741    0.792017    44.196960
19  2018-01-29  7.577544    8.458116    0.895890    47.254330

其他推荐答案

您可以使用pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.ewm.html" ewm 也请看一下那里有更多的解释.您会发现ewm可用于计算 y(row)=α*x(row)+(1 -α)*y(Row -1),例如 y 是列avg_gain, x 是列的值,α是给予 x(row) 强>

# define the number for the window
win_n = 14
# create a dataframe df_avg with the non null value of the two columns 
# Advance and Decline such as the first is the mean of the 14 first values and the rest as normal
df_avg = (pd.DataFrame({'Avg_Gain': np.append(df.Advance[:win_n].mean(), df.Advance[win_n:]), 
                        'Avg_Loss': np.append(df.Decline[:win_n].mean(), df.Decline[win_n:])},   
                        df.index[win_n-1:])
               .ewm(adjust=False, alpha=1./win_n).mean()) # what you need to calculate with your formula

# create the two other columns RS and RSI
df_avg['RS'] = df_avg.Avg_Gain / df_avg.Avg_Loss
df_avg['RSI'] = 100.-(100./(1. + df_avg['RS']))

和df_avg看起来像:

     Avg_Gain  Avg_Loss        RS        RSI
13  10.450000  2.760714  3.785252  79.102460
14   9.703571  3.242092  2.992997  74.956155
15   9.010459  3.174800  2.838119  73.945571
16   8.366855  5.594457  1.495562  59.928860
17   7.769222  8.955567  0.867530  46.453335
18   7.214278  9.108741  0.792017  44.196960
19   7.577544  8.458116  0.895890  47.254330

您可以join它到原始数据,而fillna则可以使用0:

df = df.join(df_avg).fillna(0)

本文地址:https://www.itbaoku.cn/post/1727990.html