2017-11-02

Requirements #4, #6 in HW3.

Requirement #4 says: 4. Models should be trained using stochastic gradient descent, but no momentum. The mini-batch size should be a configurable constant set at the top of your program. So , I build a train function which loop through epoch times,
          -in each loop I shuffle the training data 
                 - split it into mini-batches
                 - for each batch -> train the network
And then #6 says: 6. If the mode is either 5fold or test your program should output the overall accuracy over the test data, the four rates from the confusion matrix, the total runtime for training, and the total runtime for testing.
 5fold must also use stochastic gradient descent. 
 So even the training data is splited into 5 fold , 
 each training of 4/5  data also have to split into mini-batches before train, and the training process go through epoch number of time. 
 In other words, I reuse the train function from #4 for each 4/5 of training data. 
Is my understanding correct for the 5fold? If yes, then is the performance of 5fold a big concern? I tested with only 100 records and it took 4+ minutes. It run forever when I test with 5000 records
(Edited: 2017-11-02)
Requirement #4 says: 4. Models should be trained using stochastic gradient descent, but no momentum. The mini-batch size should be a configurable constant set at the top of your program. So , I build a train function which loop through epoch times, -in each loop I shuffle the training data - split it into mini-batches - for each batch -> train the network And then #6 says: 6. If the mode is either 5fold or test your program should output the overall accuracy over the test data, the four rates from the confusion matrix, the total runtime for training, and the total runtime for testing. 5fold must also use stochastic gradient descent. So even the training data is splited into 5 fold , each training of 4/5 data also have to split into mini-batches before train, and the training process go through epoch number of time. In other words, I reuse the train function from #4 for each 4/5 of training data. Is my understanding correct for the 5fold? If yes, then is the performance of 5fold a big concern? I tested with only 100 records and it took 4+ minutes. It run forever when I test with 5000 records

-- Requirements #4, #6 in HW3
I think your understanding sounds correct. It might be slow, but maybe not that slow.
Best, Chris
I think your understanding sounds correct. It might be slow, but maybe not that slow. Best, Chris
X