问题1:对于每个ID,我有不同的ID,我想将Item vs. Value
曲线的最小值设置为Value
。基本上,我想过滤掉这些值,并保持到最小值。Python中的数据剔除问题
问题2.我可以通过在Python中拟合切割曲线来推断吗?
请帮助我更快的解决方案,因为我有大数据集,numpy
解决方案会很好。
ID Item Value
30702556 40 1
30702556 41 1
30702556 42 1
30702556 43 1
30702556 44 1.000408
30702556 45 1.006702067
30702556 46 1
30702556 47 1
30702556 48 1
30702556 49 1.000157628
30702556 50 1.001172713
30702556 51 1.009517935
30702556 52 1
30702556 53 1.000502562
30702556 54 1.001030023
30702556 55 1
30702556 56 1.000444755
30702556 57 1.000199956
30702556 58 1
30702556 59 1
30702556 60 1.00032533
30702556 61 0.996561721
30702556 62 0.994058276
30702556 63 0.994029863
30702556 64 0.995741839
30702556 65 0.996079035
30702556 66 0.992283214
30702556 67 0.992360022
30702556 68 0.991403573
30702556 69 0.989097475
30702556 70 0.989217641
30702556 71 0.988622481
30702556 72 0.987000163
30702556 73 0.984607074
30702556 74 0.983260544
30702556 75 0.983233331
30702556 76 0.976835524
30702556 77 0.976070994
30702556 78 0.975937075
30702556 79 0.968117537
30702556 80 0.967753864
30702556 81 0.963275228
30702556 82 0.960392687
30702556 83 0.953357783
30702556 84 0.941583499
30702556 85 0.937935151
30702556 86 0.92811891
30702556 87 0.924914786
30702556 88 0.912813207
30702556 89 0.892052451
30702556 90 0.875778411
30702556 91 0.876931504
30702556 92 0.847877617
30702556 93 0.834768706
30702556 94 0.841510584
30702556 95 0.798555032
30702556 96 0.781663978
30702556 97 0.731056793
30702556 98 0.71332851
30702556 99 0.808900212
30702556 100 0.822300396
30702556 101 0.920676291
30702556 102 0.911704187
30702556 103 1
30702556 104 1
30702556 105 1
30702556 106 1
30702556 107 1
30702556 108 1
30702556 109 1
30702556 110 1
30702556 111 1
30702556 112 1
30702556 113 1
30702556 114 1
30702556 115 1
30702556 116 1
30702556 117 1
30702556 118 1
30702556 119 1
30703716 40 1
30703716 41 1
30703716 42 1
30703716 43 1
30703716 44 1.000408
30703716 45 1.006702067
30703716 46 1
30703716 47 1
30703716 48 1
30703716 49 1.000157628
30703716 50 1.001172713
30703716 51 1.009517935
30703716 52 1
30703716 53 1.000502562
30703716 54 1.001030023
30703716 55 1
30703716 56 1.000444755
30703716 57 1.000199956
30703716 58 1
30703716 59 1
30703716 60 1.00032533
30703716 61 0.996561721
30703716 62 0.994058276
30703716 63 0.994029863
30703716 64 0.995741839
30703716 65 0.996079035
30703716 66 0.992283214
30703716 67 0.992360022
30703716 68 0.991403573
30703716 69 0.989097475
30703716 70 0.989217641
30703716 71 0.988622481
30703716 72 0.987000163
30703716 73 0.984607074
30703716 74 0.983260544
30703716 75 0.983233331
30703716 76 0.976835524
30703716 77 0.976070994
30703716 78 0.975937075
30703716 79 0.968117537
30703716 80 0.967753864
30703716 81 0.963275228
30703716 82 0.960392687
30703716 83 0.953357783
30703716 84 0.941583499
30703716 85 0.937935151
30703716 86 0.92811891
30703716 87 0.924914786
30703716 88 0.912813207
30703716 89 0.892052451
30703716 90 0.875778411
30703716 91 0.876931504
30703716 92 0.847877617
30703716 93 0.834768706
30703716 94 0.841510584
30703716 95 0.798555032
30703716 96 0.781663978
30703716 97 0.731056793
30703716 98 0.71332851
30703716 99 0.808900212
30703716 100 0.822300396
30703716 101 0.920676291
30703716 102 0.911704187
30703716 103 1
30703716 104 1
30703716 105 1
30703716 106 1
30703716 107 1
30703716 108 1
30703716 109 1
30703716 110 1
30703716 111 1
30703716 112 1
30703716 113 1
30703716 114 1
30703716 115 1
30703716 116 1
30703716 117 1
30703716 118 1
30703716 119 1
那么,什么是对给定的样本预期的输出? – Divakar
预期的输出应该在排30702556 98 0.71332851之后斩数据,这个必须对所有ID做 – BigDataScientist