2014-12-06 82 views
0

最好Weka - 是否有一种很好的方法来处理(很多)用于分类名义值的数字属性?

我有很多数值,最后我想预测结果。 我的结果可能具有“0”,“1”或“x”的名义值。

我想知道的是,我怎样才能得到最好的结果。 一些分类器能比另一个更好地处理数字属性吗? 有时似乎分类器有一个不太有趣的属性的焦点...

也在此刻h。意味着主队和a。意味着客队。它会更好,如果我分裂这一点,并添加属性,位置@location {“H”,“A”} - > 0将成为1和正相反

@relation estimation 
@attribute h.teamSize numeric 
@attribute h.lineUpTeamFormation {'5-2-0-3-1' ... '6-2-0-4-1'} 
@attribute h.teamRatingAVG numeric 
@attribute h.teamRatingHighest numeric 
@attribute h.teamRatingLowest numeric 
@attribute h.teamRatingMed numeric 
@attribute h.teamRatingMedRating numeric 
@attribute h.lineUpTeamRating.att numeric 
@attribute h.lineUpTeamRating.attMid numeric 
@attribute h.lineUpTeamRating.mid numeric 
@attribute h.lineUpTeamRating.defMid numeric 
@attribute h.lineUpTeamRating.def numeric 
@attribute h.lineUpTeamRatingAVG.att numeric 
@attribute h.lineUpTeamRatingAVG.attMid numeric 
@attribute h.lineUpTeamRatingAVG.mid numeric 
@attribute h.lineUpTeamRatingAVG.defMid numeric 
@attribute h.lineUpTeamRatingAVG.def numeric 
@attribute h.lineUpTeamRatingHighest.att numeric 
@attribute h.lineUpTeamRatingHighest.attMid numeric 
@attribute h.lineUpTeamRatingHighest.mid numeric 
@attribute h.lineUpTeamRatingHighest.defMid numeric 
@attribute h.lineUpTeamRatingHighest.def numeric 
@attribute h.lineUpTeamRatingLowest.att numeric 
@attribute h.lineUpTeamRatingLowest.attMid numeric 
@attribute h.lineUpTeamRatingLowest.mid numeric 
@attribute h.lineUpTeamRatingLowest.defMid numeric 
@attribute h.lineUpTeamRatingLowest.def numeric 
@attribute a.teamSize numeric 
@attribute a.lineUpTeamFormation {'5-2-0-3-1' ... '6-2-0-4-1'} 
@attribute a.teamRatingAVG numeric 
@attribute a.teamRatingHighest numeric 
@attribute a.teamRatingLowest numeric 
@attribute a.teamRatingMed numeric 
@attribute a.teamRatingMedRating numeric 
@attribute a.lineUpTeamRating.att numeric 
@attribute a.lineUpTeamRating.attMid numeric 
@attribute a.lineUpTeamRating.mid numeric 
@attribute a.lineUpTeamRating.defMid numeric 
@attribute a.lineUpTeamRating.def numeric 
@attribute a.lineUpTeamRatingAVG.att numeric 
@attribute a.lineUpTeamRatingAVG.attMid numeric 
@attribute a.lineUpTeamRatingAVG.mid numeric 
@attribute a.lineUpTeamRatingAVG.defMid numeric 
@attribute a.lineUpTeamRatingAVG.def numeric 
@attribute a.lineUpTeamRatingHighest.att numeric 
@attribute a.lineUpTeamRatingHighest.attMid numeric 
@attribute a.lineUpTeamRatingHighest.mid numeric 
@attribute a.lineUpTeamRatingHighest.defMid numeric 
@attribute a.lineUpTeamRatingHighest.def numeric 
@attribute a.lineUpTeamRatingLowest.att numeric 
@attribute a.lineUpTeamRatingLowest.attMid numeric 
@attribute a.lineUpTeamRatingLowest.mid numeric 
@attribute a.lineUpTeamRatingLowest.defMid numeric 
@attribute a.lineUpTeamRatingLowest.def numeric 
@attribute result {'0','1','x'} 
@data 
11.0,"4-1-1-4-1",1563.0046902930617,1716.018383910481,1493.642106150469,1542.5395864396032,1604.830245030475,1594.8952627985404,6230.782838756112,1552.485746007047,1716.018383910481,6098.869361751494,1594.8952627985404,1557.695709689028,1552.485746007047,1716.018383910481,1524.7173404378734,1594.8952627985404,1617.8284702417561,1552.485746007047,1716.018383910481,1542.4611979096933,1594.8952627985404,1493.642106150469,1552.485746007047,1716.018383910481,1510.4250125761928,11.0,"5-1-1-2-2",1588.961662996073,1747.6289170494754,1508.4062919834894,1565.5233,1628.0176045164824,3459.80148294728,3079.552081457912,1542.4682316024448,1576.1754548839763,7820.5810420651915,1729.90074147364,1539.776040728956,1542.4682316024448,1576.1754548839763,1564.1162084130383,1747.6289170494754,1549.4953619285486,1542.4682316024448,1576.1754548839763,1613.8600439857894,1712.1725658978046,1530.0567195293636,1542.4682316024448,1576.1754548839763,1508.4062919834894,"x" 
11.0,"4-2-2-2-1",1475.8094913912312,1502.0682887709222,1444.990021885439,1483.7603435487183,1473.5291553281807,1490.639636207262,2978.5093856157946,2950.4346148352724,2892.2037554297044,5922.117013215507,1490.639636207262,1489.2546928078973,1475.2173074176362,1446.1018777148522,1480.5292533038767,1490.639636207262,1492.9037337533382,1502.0682887709222,1447.2137335442653,1496.2886114276891,1490.639636207262,1485.6056518624566,1448.3663260643502,1444.990021885439,1460.927921231502,11.0,"4-1-2-2-2",1484.7390000692892,1512.2300048742143,1453.444107111614,1486.4669707831615,1482.837055992914,3013.771836727523,2964.5776806684476,2961.501146916992,1453.444107111614,5938.834229337606,1506.8859183637614,1482.2888403342238,1480.750573458496,1453.444107111614,1484.7085573344016,1512.2300048742143,1501.9409533482967,1493.2838448180084,1453.444107111614,1502.7776443004382,1501.5418318533088,1462.6367273201508,1468.2173020989835,1453.444107111614,1464.7837448131381,"1" 
11.0,"6-0-1-2-2",1445.77970697302,1506.5657818615387,1393.7116666209088,1430.4622334716257,1450.1387242412238,2937.7942649521,3010.9183806060323,1402.8170557672368,0.0,8552.047075377852,1468.89713247605,1505.4591903030162,1402.8170557672368,NaN,1425.341179229642,1483.5459383871223,1506.5657818615387,1402.8170557672368,-1.0,1465.0738948215799,1454.248326564978,1504.3525987444937,1402.8170557672368,2.147483647E9,1393.7116666209088,11.0,"4-2-2-2-1",1430.4629022453128,1474.4893525633652,1404.2919287564614,1426.6619540429597,1439.3906406599133,1404.2919287564614,2864.6817220202643,2906.4018234232753,2831.550186683904,5728.166263814535,1404.2919287564614,1432.3408610101321,1453.2009117116377,1415.775093341952,1432.0415659536338,1404.2919287564614,1452.1579439472125,1474.4893525633652,1426.6619540429597,1458.4115214984754,1404.2919287564614,1412.5237780730517,1431.9124708599102,1404.8882326409444,1413.8219682802633,"x" 
11.0,"6-1-1-2-1",1455.2875865157116,1533.8148260877508,1408.8080092768812,1454.6219157957269,1471.311417682316,1440.5588774260157,2975.472084744947,1454.6219157957269,1489.241573073469,8648.269000632668,1440.5588774260157,1487.7360423724735,1454.6219157957269,1489.241573073469,1441.3781667721114,1440.5588774260157,1533.8148260877508,1454.6219157957269,1489.241573073469,1475.4245410744663,1440.5588774260157,1441.6572586571963,1454.6219157957269,1489.241573073469,1408.8080092768812,11.0,"7-1-1-1-1",1478.6812699237746,1573.5345947486803,1376.2807543215677,1487.4841795952277,1474.907674535124,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,10366.36616650411,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1480.90945235773,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1501.6224047599273,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1421.1718685458247,"0" 
... 

我希望有经验的人可以给我有一些建议。 这样:

  • 一个很好的方式来处理数字数据
  • 一个很好的方式来处理大量的属性

[我知道有没有这样的东西的最佳方式但我有一个很好的方式已经开心:)

亲切的问候

+0

可能存在您的任务的分类器,但对于所有内容没有“最佳”分类器,它实际上取决于数据和数据类型。你可能不得不尝试一些分类器,但我会给SVMs一个镜头。关于'h'和'a',我认为你不能分裂它。 – Sentry 2014-12-06 20:08:01

回答

0

试错可能是确定的分类是“最好”的最好方式。这实际上取决于很多因素,如数据的布局和预处理,数据量以及问题与分类器的匹配度。

匆匆一瞥,您可能可以尝试J48,神经网络或SVM。唯一可能需要改变的部分是Formation属性(可能将它们分成5个属性?)。除此之外,很多分类器可以基于提供的数字信息预测标称输出。

至于主屋vs远房,它看起来不错,而且可能会更好地省略额外的属性。这些类型的问题通常有利于主队,但你似乎已经知道谁在家,谁不在,所以它不应该真正增加模型。

玩什么可用,看看你如何去。结果可能会让你吃惊!

相关问题