Agree that maintaining architecture specific code is sub-optimal, but unsure why you consider hacky. By its very nature, every bit of userland code is doing cache manipulation. Only a user-process understands its business logic enough to fully optimize cache usage. Since my application software-decodes or software-renders to the framebuffer much earlier than it gets displayed, and because cache-flush of the sub-frame occurs immediately before video display, the cache-flush is often a nop as natural cache eviction has taken place in the interim.Oof, nasty hacks to be doing cache manipulations directly from userspace.
Having said that, the dmabuf sync ioctls are across the entire buffer, so are slightly heavier weight than your partial flushes.
Did not realize the DMA_BUF_IOCTL_SYNC interface flushes the entire allocation, but obviously it must since no offset/length is passed. Not optimal when all framebuffer pages are pre-allocated as a single buffer and then sub-allocated by the app. If writing something from scratch, then allocating single-page-framebuffers via DMA_HEAP_IOCTL_ALLOC probably makes sense to allow efficient DMA_BUF_IOCTL_SYNC.
Statistics: Posted by Vraz — Wed Oct 02, 2024 10:32 pm