implement some code to reduce the amounts of calls to setup vertex format, in d3d9 it gives no noticeable speedup, in opengl it still does not work right. thanks to neobrain for the idea