Merge pull request #2837 from Sonicadvance1/aarch64_faster_nonpaired [AArch64] Optimize cases when an FPR is only used for non-paired ops.