DislikedTo start with, it is highly unlikely that you would proceed to run your system live if #3 is true (unless you all know something that I don't!)
If #2 is true, then your system has failed when verified on the "out-sample" data....Ignored
If after a week every day I run my system and reset, the result is similar both in-sample and out-sample, then I shut the system down and reset it, let it run for a week and see if the result is still similar. If this is true, I repeat it few times and if the result is still similar, I can be sure that I have a system that produces right types of algos. After that I can reset the system once more and run it with FULL data (all the points I have) and move to live [I'm in this point with my current system btw, I'm just trying to find the right broker that I can use]
Out-sample is the future. It is worthless with many kinds of systems but with the kind of what I run it's priceless. I don't have to wait for 2 years to prove my system works, I can do it in a second.
Does this make any sense?
ps. funny thing is that with the correct framework it doesn't matter what kinds of rules I add to the mix. If I can find a new type of rule (I'm testing one today), I'll just add it to the mix and run the framework and see if it behaves the same way or better (more variability between algos) and if there is faster/more distributed results etc I'm quite happy. I don't see any reason to test any kind of sytem manually, I much rather let the system/rule type to be tested by my framework that can find all kinds of different statistics that I would never be able to find.