Add new ioctls for per context timestamps.
Timestamp functions (read/write/wait) will now be context
specific rather than only using the global timestamp.
Per context timestamps is a requirement for priority
based queueing.
Set the correct GMEM and istore sizes for A320 on APQ8064.
The more GMEM we have the happier we are, so the code will
work with 256K, but it will be better with 512K. For the
instruction store the size is important during GPU snapshot
and postmortem dump. Also, the size of each instruction is
different on A3XX so remove the hard coded constants and
add a GPU specific size variable.