
On Wed, Oct 12, 2011 at 11:59:30AM -0700, Alia wrote:
-------------------------------------------------------------------- -- Testing Area -------------------------------------------------------------------- outlook s | s == "sunny" = 1 | s == "overcast" = 2 | s == "rain" = 3
temp :: (Real a, Fractional n) => a -> n temp i = (realToFrac i) / (realToFrac 100)
humidity :: (Real a, Fractional n) => a -> n humidity i = (realToFrac i) / (realToFrac 100)
windy x | x == False = 0 | x == True = 1
-- attributes a1 = Discrete outlook a2 = Continuous temp a3 = Continuous humidity a4 = Discrete windy
outlookData = ["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"] tempData = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71] humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80] windyData = [False, True, False, False, False, True, True, False, False, False, True, True, False, True] outcomes = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]
d1 = zip outlookData outcomes d2 = zip tempData outcomes d3 = zip humidityData outcomes d4 = zip windyData outcomes
t1 = id3 [a1] d1 t2 = id3 [a2] d2 t3 = id3 [a3] d3 t4 = id3 [a4] d4
--t5 = id3 [a1,a2,a3,a4] [d1,d2,d3,d4] -- doesn't work because you can't mix strings and numbers in a list --
This also doesn't work because [d1,d2,d3,d4] isn't the right type, even if you could mix strings and numbers in a list: d1, d2, etc. are each lists of pairs, so [d1,d2,d3,d4] is a list of lists of pairs. I think what you really want is to combine all the data for each observation into a single structure. Something like this: data Item = Item String Double Double Bool outlook (Item "sunny" _ _ _) = 1 outlook (Item "overcast" _ _ _) = 2 outlook (Item "rain" _ _ _) = 3 temp (Item _ i _ _) = (realToFrac i) / (realToFrac 100) humidity (Item _ _ i _) = (realToFrac i) / (realToFrac 100) windy (Item _ _ _ False) = 0 windy (Item _ _ _ True) = 1 -- attributes a1 = Discrete outlook a2 = Continuous temp a3 = Continuous humidity a4 = Discrete windy outlookData = ["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"] tempData = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71] humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80] windyData = [False, True, False, False, False, True, True, False, False, False, True, True, False, True] outcomes = [0,0,1,1,1,0,1,0,1,1,1,1,1,0] d = zip (zipWith4 Item outlookData tempData humidityData windyData) outcomes t1 = id3 [a1] d t2 = id3 [a2] d t3 = id3 [a3] d t4 = id3 [a4] d t5 = id3 [a1,a2,a3,a4] d Now t5 works just fine. -Brent