[ARM] Change all floating point loadstores to fastmem implementations except lfs since all floating point accesses tend to be to RAM space. lfs tends to get used to write quickly to the gatherpipe and other places, look at the JIT64 implementation to see how to make it quicker.