2016-09-15 67 views
2

第一部分计数多少次连续和的结果为正(或负)

I have a dataframe with finance data (33023 rows, here the link to the data: https://mab.to/Ssy3TelRs); df.open is the price of the title and df.close is the closing price.

I have been trying to see how many times in a row the title closed with a gain and with a lost.

The result that I'm looking for should tell me that the title was positive 2 days in a row x times, 3 days in a row y times, 4 days in a row z times and so forth.

I have started with a for:

for x in range(1,df.close.count()): y = df.close[x]-df.open[x] 

and then unsuccessful series of if statements...

Thank you for your help.

CronosVirus00

EDITS:

>>> df.head(7) 
     data ora  open  max  min close Unnamed: 6 
0 20160801 0 1.11781 1.11781 1.11772 1.11773   0 
1 20160801 100 1.11774 1.11779 1.11773 1.11777   0 
2 20160801 200 1.11779 1.11800 1.11779 1.11795   0 
3 20160801 300 1.11794 1.11801 1.11771 1.11771   0 
4 20160801 400 1.11766 1.11772 1.11763 1.11772   0 
5 20160801 500 1.11774 1.11798 1.11774 1.11796   0 
6 20160801 600 1.11796 1.11796 1.11783 1.11783   0 

Ifs:

for x in range(1,df.close.count()): y = df.close[x]-df.open[x] if y > 0 :  green += 1  y = df.close[x+1] - df.close[x+1] 
    twotimes += 1  if y > 0 :   green += 1   y = df.close[x+2] - 

df.close[x+2] threetimes += 1 if y > 0 : green += 1 y = df.close[x+3] - df.close[x+3] fourtimes += 1

FINAL SOLUTION

Thank you all! And the end I did this:

df['test'] = df.close - df.open >0 
green = df.test #days that it was positive 

def gg(z): 
    tot =green.count() 
    giorni = range (1,z+1) # days in a row i wanna check 
    for x in giorni: 
     y = (green.rolling(x).sum()>x-1).sum() 
     print(x," ",y, " ", round((y/tot)*100,1),"%") 

gg(5) 
1 14850 45.0 % 
2 6647 20.1 % 
3 2980 9.0 % 
4 1346 4.1 % 
5 607 1.8 % 
+1

请包括您的不成功的if语句。另外,python依赖于缩进,因此,请确保您的代码格式与*代码中的*完全相同。 – dckuehn

+0

你是否希望连续至少有n个积极日子的天数和本身包含在内,或连续数至少有'n'个积极日子的天数? – jotasi

+0

你还可以提供所需的数据集/ DF? – MaxU

回答

2

如果我理解正确你的问题,你可以这样来做:

In [76]: df.groupby((df.close.diff() < 0).cumsum()).cumcount() 
Out[76]: 
0 0 
1 1 
2 2 
3 0 
4 1 
5 2 
6 0 
7 0 
dtype: int64 

The result that I'm looking for should tell me that the title was positive 2 days in a row x times, 3 days in a row y times, 4 days in a row z times and so forth.

In [114]: df.groupby((df.close.diff() < 0).cumsum()).cumcount().value_counts().to_frame('count') 
Out[114]: 
    count 
0  4 
2  2 
1  2 

数据集:

In [78]: df 
Out[78]: 
     data ora  open  max  min close 
0 20160801 0 1.11781 1.11781 1.11772 1.11773 
1 20160801 100 1.11774 1.11779 1.11773 1.11777 
2 20160801 200 1.11779 1.11800 1.11779 1.11795 
3 20160801 300 1.11794 1.11801 1.11771 1.11771 
4 20160801 400 1.11766 1.11772 1.11763 1.11772 
5 20160801 500 1.11774 1.11798 1.11774 1.11796 
6 20160801 600 1.11796 1.11796 1.11783 1.11783 
7 20160801 700 1.11783 1.11799 1.11783 1.11780 

In [80]: df.close.diff() 
Out[80]: 
0  NaN 
1 0.00004 
2 0.00018 
3 -0.00024 
4 0.00001 
5 0.00024 
6 -0.00013 
7 -0.00003 
Name: close, dtype: float64 
+0

它的工作!谢谢 – CronosVirus00

2

这听起来像你想要做的是:

  • 计算两个系列的差异(打开&关闭),例如diff = df.open - df.close
  • 对结果应用条件以获得布尔系列diff > 0
  • 所产生的布尔系列传递给数据框来获取数据框的子集,其中的条件为真df[diff > 0]
  • 查找应用逐列函数的所有连续子序列,以识别和计数

我需要登一架飞机,但我会提供一个最后一步看起来像什么时候的样本。

+1

你的第一个3分是现货!现在我需要弄清楚如何做你的第四个建议。保持更新 – CronosVirus00

2

如果我正确地理解了你,你想要的天数至少包含n之前的连续正数天。

同样什么@Thang建议,您可以使用rolling

import pandas as pd 
import numpy as np 

df = pd.DataFrame(np.random.rand(10, 2), columns=["open", "close"]) 
# This just sets up random test data, for example: 
#  open  close 
# 0 0.997986 0.594789 
# 1 0.052712 0.401275 
# 2 0.895179 0.842259 
# 3 0.747268 0.919169 
# 4 0.113408 0.253440 
# 5 0.199062 0.399003 
# 6 0.436424 0.514781 
# 7 0.180154 0.235816 
# 8 0.750042 0.558278 
# 9 0.840404 0.139869 

positiveDays = df["close"]-df["open"] > 0 
# This will give you a series that is True for positive days: 
# 0 False 
# 1  True 
# 2 False 
# 3  True 
# 4  True 
# 5  True 
# 6  True 
# 7  True 
# 8 False 
# 9 False 
# dtype: bool 

daysToCheck = 3 
positiveDays.rolling(daysToCheck).sum()>daysToCheck-1 

现在,这会给你一个系列,这表明每一天,无论是积极的daysToCheck数量连续天数:现在

0 False 
1 False 
2 False 
3 False 
4 False 
5  True 
6  True 
7  True 
8 False 
9 False 
dtype: bool 

可以使用(positiveDays.rolling(daysToCheck).sum()>daysToCheck-1).sum()得到的天数(在本例中3)服从这一点,这是你想要的,据我了解。

+0

我现在正在更新熊猫,因为可以从0.18版本(我有0.17)中进行滚动。我会让你知道它是否有效。 – CronosVirus00

+0

作品!谢谢 – CronosVirus00

0

这应该工作:

import pandas as pd 
import numpy as np 
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close']) 

test['gain?'] = (test['open']-test['close'] < 0) 
test['cumulative'] = 0 

for i in test.index[1:]: 
    if test['gain?'][i]: 
     test['cumulative'][i] = test['cumulative'][i-1] + 1 
     test['cumulative'][i-1] = 0 

results = test['cumulative'].value_counts() 

忽略了 '0' 行中的结果。如果你想要将两天的运行时间同时计算为一次运行,那么可以修改它,而不会有太多麻烦。

编辑:无警告 -

import pandas as pd 
import numpy as np 

test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close']) 
test['gain?'] = (test['open']-test['close'] < 0) 
test['cumulative'] = 0 

for i in test.index[1:]: 
    if test['gain?'][i]: 
     test.loc[i,'cumulative'] = test.loc[i-1,'cumulative'] + 1 
     test.loc[i-1,'cumulative'] = 0 

results = test['cumulative'].value_counts() 
+0

它给了我这个错误: test ['cumulative'] [i] = test ['cumulative'] [i-1] + 1 SettingWithCopyWarning: 正试图在片的副本上设置一个值从DataFrame – CronosVirus00

+0

我不/认为/警告有什么区别?但我已经编辑删除它。 –

+0

是的,你是对的,它的工作原理。谢谢 – CronosVirus00