Android BEV — Full Presentation

Appraid Tech ProjectMarch 2026

Bird's-Eye View
on Android

From Android platform fundamentals to pure-Kotlin fisheye inverse perspective mapping — a complete engineering walkthrough.

Mei Fisheye IPMJetson Nano · Android TVFB-SSEM DatasetPure Kotlin · No AI Runtime

https://github.com/HassanTarek-AppraidTech/Android-BEV-App · arXiv:2303.03651

Speaker notes

Welcome. This covers the full stack — from what Android is, through AAOS, to how a pure-geometry bird's-eye view is computed on embedded Android hardware.

Section 01 — Android Platform

What is Android?

Definition

Android is an open-source operating system based on the Linux kernel, developed by Google. It provides a complete software stack: kernel, middleware, runtime, and key applications. Released in 2008, it now powers over 3 billion active devices worldwide — phones, tablets, TVs, watches, cars, and embedded boards.

Architecture Layers

Layer 5Applications — your APK

Layer 4Application Framework — Activity, Services, ContentProviders

Layer 3Android Runtime (ART) + Core Libraries

Layer 2Hardware Abstraction Layer (HAL)

Layer 1Linux Kernel — drivers, memory, power

Key Characteristics

Open Source — AOSP is publicly available and forkable by any manufacturer

Multi-form factor — phones, tablets, TVs, watches, cars, embedded boards

APK packaging — one binary runs on any compatible Android device

Kotlin-first since 2019 — Google's preferred language for Android

Current Version · 2026

Android 15 (API 35) is the current stable release. This project targets API 23 (Android 6.0) for broad compatibility including the Jetson Nano running LineageOS.

Speaker notes

Android is not just a phone OS — it's a full software platform. The Jetson Nano running LineageOS proves this: same APK, same ART runtime, just different hardware underneath.

Section 01 — App Structure

Anatomy of an Android Application

Manifest — AndroidManifest.xml

The most important file in any Android app. Declares the app to the OS: which activities exist, what permissions are needed, which hardware features are required or optional, and which intent filters make the app launchable. Without a valid manifest the OS will not install the APK.

Activities — MainActivity.kt

An Activity represents one screen. It has a lifecycle managed by the OS: onCreate → onStart → onResume → onPause → onStop → onDestroy. Our app has one Activity that owns the entire UI and coordinates all computation.

Assets — bev_images/

Static files bundled inside the APK. Accessed via context.assets.open(). Our fisheye camera PNG images and the ground-truth BEV image live here as bev_images/{cam}/0.png.

Jetpack Compose — UI Layer

Compose replaces the old XML layout system. UI is declared as Kotlin functions annotated @Composable. State variables using mutableStateOf() automatically trigger UI redraws when changed. We use TV Material3 which adds focus-ring navigation for remote control.

Gradle Build System

Gradle manages compilation, dependency downloading, and APK packaging. build.gradle.kts declares the SDK version, Compose and TV Material3 library versions, and signing config. The libs.versions.toml version catalog centralises all dependency versions.

Kotlin Coroutines — Threading

Dispatchers.Default runs CPU-heavy work on a background thread pool. Dispatchers.Main runs UI updates on the main thread. withContext() switches between them safely. BEV computation (3–5 seconds) must run on Default — without this Android shows an ANR dialog after 5 seconds.

Speaker notes

Every Android app has this skeleton. The threading model is the most critical Android-specific engineering decision in this project.

Section 01 — AAOS

What is Android Automotive OS (AAOS)?

Definition

Android Automotive OS is a version of Android built directly into the vehicle's infotainment system — not a phone projection like Android Auto. It runs permanently on the car's own ECU hardware whether or not any phone is connected. First production vehicle: Volvo XC40 Recharge (2020).

Key Additions Over Standard Android

Vehicle HAL (VHAL) — direct access to CAN bus, vehicle speed, gear, door status, hundreds of vehicle properties

CarService API — controls HVAC zones, audio focus, driver monitoring, instrument cluster

EVS (Exterior View System) — low-latency hardware camera access for surround-view, bypassing Camera2 API

Always-on architecture — boots with ignition, no user unlock, persistent services

Multi-display — native support for cluster + IVI + rear seat as separate zones

Functional Safety — ISO 26262 ASIL integration hooks

OEMs Using AAOS · 2026

Volvo / Polestar

Pioneer adopter

GM / Chevrolet

Ultifi platform

Honda

e:N series

Renault

Scenic E-Tech

Relevance to This Project

Our BEV app targets Android TV on Jetson Nano — a stepping stone. The same Kotlin code, Compose UI, and BevProcessor pipeline could run on an AAOS head unit with two main changes: replace asset-loaded PNGs with live EVS camera feeds, and replace TV Material3 navigation with CarUI library patterns. The BevProcessor geometry is fully platform-agnostic.

Speaker notes

The most important AAOS difference for our project is EVS — instead of loading images from assets, we would get direct hardware camera frames at low latency. That single change transforms this prototype into a production system.

Section 01 — Platform Comparison

Android vs AAOS vs Android TV

Aspect	Standard Android	AAOS	Android TV (our target)
Hardware	Phone / Tablet	In-vehicle ECU / SoC	TV / Embedded board (Jetson Nano)
Input method	Touchscreen, gesture	Rotary knob, touch, voice	Remote control D-pad focus navigation
Boot trigger	Power button	Ignition / CAN bus event	Power supply connected
Camera access	Camera2 API	VHAL + EVS (Exterior View System)	Camera2 API or assets (our case)
UI framework	Compose / Views	Compose + CarUI library	Compose + TV Material3
Multi-display	Limited	Native — cluster + IVI + rear	Single display
Safety standard	None	ISO 26262 ASIL hooks	None
App distribution	Play Store	Play Store for Cars (restricted)	Play Store for TV
Always-on	No	Yes — with ignition	Depends on power supply

Bottom line: All three share the same Linux kernel, ART runtime, Kotlin compiler, and Gradle build system. Differences are in hardware HALs, UI navigation conventions, and safety requirements. Code written for Android TV can be ported to AAOS by changing only the camera input source and UI navigation patterns — the BevProcessor is already platform-agnostic.

Speaker notes

The Camera access row is the most important. On Android TV we load static PNGs. On AAOS we use EVS. That single change transforms this into a production vehicle system.

Section 02 — Project Introduction

The Problem — Surround View for Vehicles

Problem Statement

A driver has four cameras around the vehicle but cannot perceive the surrounding environment as a unified top-down map. Each camera shows only its own distorted fisheye perspective. Stitching them into a coherent bird's-eye view requires solving the inverse perspective problem for wide-angle fisheye lenses — without a depth sensor and in real time on embedded hardware.

Our Solution

For every pixel in a 500×500 top-down output canvas, compute its real-world ground position in metres, project through each of the 4 fisheye camera models using the Mei projection equation, sample the colour, and blend all contributions weighted by viewing angle and distance. Runs entirely on Jetson Nano CPU in ~3 seconds with no GPU, no network, no AI.

500²

Output pixels

10 m

Coverage radius

4

Fisheye cameras

0

AI at runtime

Dataset — FB-SSEM

Forward-looking Binocular Surround-view Semantic Estimation and Mapping. Unity-simulated parking lot scenes with 4 calibrated fisheye cameras and overhead ground-truth BEV images. Provides calibration files: intrinsics.yml + extrinsics.txt. Paper: arXiv:2303.03651.

Speaker notes

The problem is fundamental geometry — how do you unwarp four fisheye perspectives into one overhead view? This is called AVM (Around View Monitor) in the automotive industry and is standard in premium vehicles.

Section 02 — Methodology

Three Ways to Build a Surround View System

Method 1

Our approach

Classical IPM

Inverse Perspective Mapping — pure geometry

Pros

No training data — only calibration files

Deterministic — identical input = identical output

Fully explainable — every pixel traceable to a formula

Runs on CPU — no GPU required

Tiny APK — no model weights

Cons

Oblique-angle blur on side lanes — physics limit

Flat-ground assumption — 3D objects warp

Car body projects into BEV image

Method 2

Deep Learning BEV

CNN / Transformer — e.g. F2BEV, BEVFormer

Pros

Sharp side-lane reconstruction from learned priors

Can model 3D objects with learned depth cues

Can produce semantic labels — lane, vehicle, road

State-of-the-art visual quality on benchmarks

Cons

Requires GPU — TensorRT or ONNX Runtime

Needs thousands of labelled training images

Black box — hard to debug unexpected outputs

Large APK — model weights add 50–200 MB

Domain gap — may fail on unseen environments

Method 3

Hybrid IPM + ML

Geometry backbone + learned refinement

Pros

IPM provides geometric accuracy as backbone

Smaller network — only corrects IPM artifacts

Interpretable: geometry debuggable, ML refines residuals

Best tradeoff between quality and complexity

Cons

More complex — two stages to maintain

Still needs some training data

Still needs GPU for refinement pass

Speaker notes

We chose Method 1: the Jetson Nano CPU cannot run a neural network in real time, and we wanted a fully explainable system. Method 3 is the natural next step when GPU compute becomes available.

Section 03 — Hardware & Calibration

System Hardware & Camera Setup

Compute Platform — Jetson Nano

Board	NVIDIA Jetson Nano
OS	LineageOS / Android TV
CPU	ARM Cortex-A57 quad-core @ 1.43 GHz
RAM	4 GB LPDDR4
GPU	128-core Maxwell (not used by this app)
APK target SDK	API 23 (Android 6.0+)
Compute time	~3 seconds (CPU only)

Camera Positions — extrinsics CSV

Origin = centre of rear axle. X=forward, Y=right, Z=up. rz=180° cameras are physically mounted upside-down and are flipped 180° in code before processing.

Camera	X (m)	Y (m)	Z (m)	rz handling
Front	0.000	0.406	3.873	flip 180°
Left	-1.024	0.800	2.053	no flip
Rear	0.132	0.744	-1.001	flip 180°
Right	1.015	0.801	2.040	no flip

Camera Intrinsics — camera_intrinsics.yml

Measured through physical calibration using a checkerboard pattern. Specific to this camera model.

ξ (xi)	1.7634	Mei fisheye parameter
fx	331.0 px	Horizontal focal length
fy	331.0 px	Vertical focal length
cx	256.0 px	Principal point X
cy	256.0 px	Principal point Y
Image size	512 × 512 px	After scaling

Yaw Angles per Camera

Each camera's horizontal facing direction. Used to build the 3×3 rotation matrix and compute angular blend weights.

Camera	Yaw (°)	Yaw (rad)	Direction
Front	0°	0	Straight ahead
Right	90°	π/2	Rightward
Rear	180°	π	Backward
Left	270°	3π/2	Leftward

Speaker notes

The rz=180° physical mounting is the most surprising hardware detail. The front and rear cameras are literally mounted upside-down. Without the bitmap rotation correction, their images appear inverted in the BEV output.

Section 04 — Algorithm

BEV Canvas — Coordinate System

Canvas Parameters

Output size	500 × 500 px
Scale	0.02 m/px = 2 cm per pixel
Total coverage	10 m × 10 m ground area
Origin pixel	(col=250, row=250)
Forward range (X)	−5 m to +5 m
Lateral range (Y)	−5 m to +5 m

Pixel → World Conversion

// Row 0 = farthest forward (+5m ahead) // Row 500 = farthest back (-5m behind) worldX = RANGE_X_MAX - row × SCALE = 5.0 - row × 0.02 // metres forward // Col 0 = far left (-5m), Col 500 = far right (+5m) worldY = RANGE_Y_MIN + col × SCALE = -5.0 + col × 0.02 // metres rightward

Image Orientation

In the output image, UP = forward (ahead of car). Row 0 is farthest forward. Row 500 is farthest backward. This matches the natural driving perspective — what's ahead is at the top.

Speaker notes

The coordinate system is the foundation of everything. The origin at (250,250) represents the real-world rear axle — the physical reference point for all camera extrinsics.

Section 04 — IPM Concept

Inverse Perspective Mapping — Core Insight

Forward Projection — Not What We Do

For each camera pixel (u,v) → find where it lands on the ground. Problem: many fisheye pixels point at the sky, other vehicles, or buildings — not the ground at all. Requires depth estimation or LiDAR to resolve where each ray intersects the ground plane. Far more complex.

Inverse Projection — Our Approach

For each output BEV pixel → compute its real-world ground position → ask each camera: which of your pixels shows this ground location? Guaranteed to only sample ground-level content. No depth sensor needed — we assume Y=0 (flat ground) which is exact for road surface.

Flat Ground Assumption

Every BEV pixel maps to height Y=0. Exact for road surface, parking lines, markings. Fails for 3D objects — other cars, walls, pedestrians — which appear stretched because their actual height is non-zero. This is the fundamental physical limitation of all camera-only IPM systems.

Algorithm — 6 Steps Per Pixel Per Camera

A

World coordinates

Convert BEV pixel (row,col) to real-world metres (worldX, worldY)

B

Camera transform

[xc,yc,zc] = R × [worldX, worldY, camHeight=1m]

C

Mei fisheye projection

r3d → mzXi → pixel coordinates (u,v) in fisheye image

D

Blend weight

cosA^1.5 / distance — prioritise direct-viewing cameras

E

Bilinear interpolation

Sub-pixel colour sampling from 4 surrounding pixels

F

Weighted accumulation

bevR/G/B[i] += w×colour · totalW[i] += w

Speaker notes

The inversion is the entire trick. We never ask "where does this camera pixel land?" We ask "which camera pixel shows this ground location?" The flat ground assumption is the price for not needing depth sensors.

Section 04 — Step B

Camera Transform — Rotation Matrix

Purpose

Converts a direction vector from world coordinates into the camera's own local coordinate frame. After this, xc is rightward in the camera frame, yc is upward, and zc is depth along the optical axis. If zc is negative the ground point is behind the camera — skip it.

3×3 Matrix Structure

Row 1 (X axis) — camera right direction, perpendicular to optical axis in the horizontal plane

Row 2 (Y axis) — camera up direction, computed as cross product of Z×X

Row 3 (Z axis) — optical axis direction the camera faces = (cos(yaw), sin(yaw), 0)

Example — Front Camera (yaw = 0°)

cos(0°)=1, sin(0°)=0. Z-axis=(1,0,0) points straight forward. X-axis=(0,1,0) points right. For a ground point 5m ahead: after transform xc=0 (no lateral), yc=0 (no vertical), zc=5 (directly in front at positive depth).

// buildRotationMatrix(yaw: Float) // Z-axis: camera optical axis direction val zx = cos(yaw) // world-forward component val zy = sin(yaw) // world-right component // X-axis: camera right (perpendicular to Z, 90° rotated) var xx = -zy var xy = zx // Y-axis: camera up = Z cross X val yz = zx*xy - zy*xx // Row-major 3×3 matrix stored as FloatArray[9] // Row 1: [ xx, xy, 0 ] ← camera right // Row 2: [ 0, 0, yz ] ← camera up // Row 3: [ zx, zy, 0 ] ← optical axis // Applied in main loop: // dx=worldX dy=worldY dz=CAM_HEIGHT(1m) val xc = R[0]*dx + R[1]*dy + R[2]*dz val yc = R[3]*dx + R[4]*dy + R[5]*dz val zc = R[6]*dx + R[7]*dy + R[8]*dz // zc < 0 → point behind camera → skip

Speaker notes

The matrix is pre-built once per camera before the main loop. Building it 250,000 times inside the loop would be pure waste. The 9 elements are extracted to local variables before the pixel loop for CPU register efficiency.

Section 04 — Step C

Mei Unified Fisheye Model

Why Not Standard Pinhole?

Standard pinhole: u = fx×(xc/zc)+cx. Fails above ~160° FOV because zc approaches zero for near-horizontal rays (division by near-zero → extreme distortion) and becomes negative for rays beyond 90° (point appears behind camera even though the lens physically sees it). Our fisheye cameras exceed 180° FOV.

Mei Model Concept

Project the 3D point onto a unit sphere. Then shift the projection centre inward along the optical axis by ξ (xi). This single parameter controls how fisheye the projection is. ξ=0 recovers standard pinhole exactly. As ξ increases, wider angles compress into the image, allowing 200°+ FOV.

ξ Parameter — Intuition

ξ = 0.0	Standard pinhole — fails above ~160° FOV
ξ = 0.5	Mild wide-angle lens
ξ = 1.0	Fisheye ~180° field of view
ξ = 1.76	Our cameras — extreme fisheye, ~200° FOV

// Step C: Mei fisheye projection // 1. Distance from camera origin to ground point val r3d = sqrt(xc*xc + yc*yc + zc*zc) + 1e-9f // +1e-9f prevents division by zero // 2. Mei denominator: normalise onto sphere then add ξ val mzXi = zc / r3d + XI // XI = 1.7634 if (mzXi < 1e-5f) continue // mzXi ≤ 0: point behind mirror plane → skip // 3. Project to image plane // xc/r3d = x on unit sphere (range -1..+1) // divide by mzXi = Mei perspective division // multiply by FX = scale to pixels // add CX = shift to top-left image origin val u = FX * (xc / r3d) / mzXi + CX val v = FY * (yc / r3d) / mzXi + CY // 4. Bounds check — skip if outside image if (u < 0f || u >= IMG_W-1f || v < 0f || v >= IMG_H-1f) continue

What xc/r3d Means

Dividing xc by r3d places the 3D point on the surface of a unit sphere. The result is the x-direction cosine — always between −1 and +1. Dividing by mzXi applies the Mei perspective division. The larger ξ is, the more mzXi stays away from zero even for wide angles, which is what enables the fisheye to see beyond 90°.

Speaker notes

When ξ=0: mzXi = zc/r3d = cosine of the angle from the optical axis. This recovers pinhole exactly. The ξ shift is the sole mathematical difference between fisheye and pinhole projection.

Section 04 — Steps D & E

Blend Weight + Bilinear Interpolation

Step D — Blend Weight

Multiple cameras can see the same ground point. All contributions are blended. Each camera's weight is proportional to how directly it looks at the point and inversely proportional to the distance from the car origin. This gives smooth transitions between camera zones with no hard seams.

// Distance from car origin to ground point val dist = sqrt(worldX*worldX + worldY*worldY) + 1e-6f // cosA: dot product of camera optical axis with // direction to the ground point. // 1.0 = camera looks directly at point // 0.0 = point is 90° to the side // negative = point is behind the camera val cosA = (cos(yaw)*worldX + sin(yaw)*worldY) / dist if (cosA <= 0f) continue // skip if behind camera // cosA^1.5 sharpens falloff for oblique views // coerceIn prevents div/0 and extreme weights val weight = cosA.pow(1.5f) / dist.coerceIn(0.1f, 15f)

Step E — Bilinear Interpolation

The projected coordinate (u,v) is almost never a whole number. Without interpolation the output has a coarse blocky texture. Bilinear blends the four integer-coordinate neighbours weighted by how close the point is to each corner.

// Integer neighbours around the projected point val u0 = u.toInt() // floor(u) = left col val v0 = v.toInt() // floor(v) = top row val u1 = u0 + 1 // right neighbour val v1 = v0 + 1 // bottom neighbour val du = u - u0 // fractional part 0.0..1.0 val dv = v - v0 // fractional part 0.0..1.0 // 4-corner weighted blend: // c00*(1-du)*(1-dv) + c01*du*(1-dv) // c10*(1-du)*dv + c11*du*dv

Performance Note

The cosA ≤ 0 check runs BEFORE the matrix multiply and Mei projection — eliminating ~50% of iterations before any expensive maths. This single early exit is the biggest single performance optimisation in the whole pipeline.

Speaker notes

cosA^1.5 is a tuneable parameter. Lower = smoother blending but blurrier. Higher = sharper camera zone boundaries but more visible transition lines. 1.5 was found empirically to give the best result on this dataset.

Section 04 — Steps F & Assembly

Accumulation → Final Bitmap

Step F — Weighted Accumulation

For each output pixel, all 4 cameras may contribute. Their colour contributions are summed with individual weights into parallel float arrays. After the complete loop, dividing by total weight gives the final weighted-average colour. Two-pass approach guarantees mathematically correct blend regardless of camera count.

// Inside camera loop — per pixel per camera bevR[idx] += weight * bilinear(R0,R1,R2,R3, du,dv) bevG[idx] += weight * bilinear(G0,G1,G2,G3, du,dv) bevB[idx] += weight * bilinear(B0,B1,B2,B3, du,dv) totalW[idx] += weight // After all cameras and all pixels — normalise for (i in 0 until BEV_W * BEV_H) { val w = totalW[i] if (w > 1e-9f) { outPixels[i] = Color.rgb( (bevR[i] / w).toInt().coerceIn(0, 255), (bevG[i] / w).toInt().coerceIn(0, 255), (bevB[i] / w).toInt().coerceIn(0, 255)) } // else stays BLACK — under car body }

Why Two Passes?

We cannot compute the final average during the inner loop because when processing the first camera for a pixel we don't yet know the total weight from all 4 cameras. Accumulate first, normalise second. Same principle used in HDR tone mapping, alpha compositing, and radiosity rendering.

Memory Layout — Why Flat Arrays?

Four parallel FloatArrays — bevR, bevG, bevB, totalW — each 250,000 elements (500×500). Flat arrays have sequential memory layout with excellent CPU cache locality. Reading the next element is one pointer increment. A 2D array or Map would add bounds-checking overhead on every one of the 1,000,000 inner-loop iterations.

Pixels with totalW ≈ 0

Pixels directly under the car body are too close for any camera to see the ground through the chassis. Their totalW stays near zero. They remain black in the output bitmap and are covered by the green car box overlay drawn on top.

Speaker notes

The two-pass weighted mean is standard in signal processing: accumulate numerator (weighted sum) and denominator (total weight) separately, then divide once. coerceIn(0,255) clamps floating point rounding errors.

Section 04 — Overlay

drawOverlay() — Annotations

Coordinate Helper — toPx()

Converts world metres back to BEV pixel position — the exact inverse of the main loop formula.

// Inverse of main loop coordinate conversion fun toPx(xM: Float, yM: Float) = Pair( (yM - RANGE_Y_MIN) / SCALE, // → col pixel (RANGE_X_MAX - xM) / SCALE // → row pixel ) // toPx(0f, 0f) = (250, 250) = rear axle origin // toPx(5f, 0f) = (250, 0) = far forward // toPx(0f, 2f) = (350, 250) = 2m to the right

Overlay Elements

Car box	±2m forward/rear, ±1m lateral + CAR_Y_OFFSET. Dark fill hides projection artifacts.
Arrow	Shaft from rear(−1.2m) to front(+1.5m). Tip UP = forward. Wings spread left+right downward = ∧ arrowhead.
2m ring	Cyan, radius = 2/SCALE = 100px, centred on (250,250).
4m ring	Blue, radius = 4/SCALE = 200px, centred on (250,250).

CAR_Y_OFFSET = 1.3f

The rear camera is mounted at X=+0.132m (slightly right of centre). Combined with the fisheye projection geometry, the car body appears shifted rightward in the rendered BEV texture. CAR_Y_OFFSET shifts the overlay box to visually match. Tune by visual comparison with an overhead reference image for a different vehicle.

Arrow Bug — Fixed

The original code had both arrowhead wings going to bx−10 — both going LEFT, making a sideways ⟩ shape. Fix: left wing = (bx−10, by+10), right wing = (bx+10, by+10). Both go downward (by+10 = larger row = backward in image), spreading left and right to form a correct ∧ arrowhead pointing toward the front of the car.

Why Rings Are at (250,250)

toPx(0f, 0f) = column 250, row 250 — the pixel coordinate of the real-world origin (rear axle). Rings represent physical 2m and 4m radius circles around the vehicle's rear axle, giving the driver a calibrated spatial reference in the display.

Speaker notes

toPx() is the exact algebraic inverse of the main loop formula. Forward direction maps to smaller row numbers because row 0 is the top of the image = farthest forward. This is why the arrowhead at the front appears at the top of the green box.

Section 05 — Implementation

MainActivity.kt — UI & Orchestration

Threading Model — Critical Design Decision

// computeBev() — correct coroutine pattern scope.launch { // Main thread isLoading = true val (bev, gt, dt) = withContext(Dispatchers.Default) { // Background thread pool // MUST be here — 3-5s on Main = ANR dialog val imgs = loadImages() val t0 = System.currentTimeMillis() val bev = BevProcessor.buildBev(imgs) Triple(bev, loadGt(), System.currentTimeMillis() - t0) } // Back to Main bevBitmap = bev // triggers Compose redraw timingText = "Computed in ${dt} ms" isLoading = false }

State Variables — Jetpack Compose

Declared with mutableStateOf() inside remember{}. Any change automatically triggers a Compose recomposition — affected UI parts redraw themselves with no manual invalidation.

bevBitmap	Computed BEV result bitmap or null
gtBitmap	Cropped ground truth bitmap or null
showingGt	Toggle — which image to display
isLoading	Controls progress bar and button disabled state
statusText	Status line shown under the title
timingText	Elapsed compute time in milliseconds

UI Components — Jetpack Compose TV

Column — vertical layout fills screen with dark background (#111111)

Box with weight(1f) — image display expands to fill all remaining vertical space between title and buttons

Image(ContentScale.Fit) — scales the 500×500 BEV bitmap to fill the box preserving its square aspect ratio

BevButton (TV Material3) — focusedContainerColor=green makes the selected button clearly visible under D-pad navigation

LinearProgressIndicator — shown during BEV computation to indicate background activity

LaunchedEffect(Unit) — auto-triggers computeBev() on first launch so BEV appears immediately

saveSnapshot() — File I/O

Writes PNG to /sdcard/Documents/BevSnapshots/ using FileOutputStream + bitmap.compress(PNG, quality=100). Creates directory if absent. Lossless PNG preserves all pixel data. Full file path shown in statusText after successful save.

Speaker notes

Threading is the most critical Android-specific engineering decision. Without withContext(Dispatchers.Default), the 3-5 second computation blocks the main thread and Android shows an ANR dialog after 5 seconds.

Section 06 — Results & Limitations

Output Quality & Engineering Limitations

Performance Metrics

Compute time	~3 seconds (Jetson Nano CPU only)
Total iterations	500×500×4 = 1,000,000
Working memory	4 arrays × 250K × 4B ≈ 4 MB
Output bitmap	500×500 ARGB ≈ 1 MB
Input bitmaps	4 × 512×512 ARGB ≈ 4 MB
Total peak memory	~10 MB — well within Jetson RAM

What Works Well

Road texture directly ahead and behind — geometrically accurate

Lane markings in front and rear zones — sharp and correctly positioned

Distance rings calibrated to real metres within 2cm precision

Camera seams blend smoothly with no visible hard transition lines

Fundamental IPM Limitations

Side lane blur: Lanes 4–6m lateral are seen at ~10° above horizontal. 1px fisheye error = ~50cm ground error. Not fixable without AI or depth sensors — this is a physics constraint.

3D object warping: Other cars, walls, pedestrians appear stretched because IPM assumes Y=0. Their actual height above ground is projected as if they were flat.

Car body artifact: Side cameras project the ego vehicle's own body into the BEV. Mitigated by the dark car box overlay. Fundamental to all camera-only IPM.

Next Steps

Multi-thread with Kotlin coroutines — parallel per-camera passes → target <1 second

Real-time camera feed via Android Camera2 API or AAOS EVS

F2BEV neural network refinement pass for side-lane artifact correction

Port to AAOS with EVS camera access for in-vehicle production deployment

Speaker notes

The side lane blur is not a bug — it's physics. At 6m lateral, the camera sees ground at less than 10° elevation. One pixel maps to ~50cm of ground. No software tuning fixes this without depth information.

Open Source · March 2026

Built from
scratch.
Pure geometry.

No server. No neural network at runtime. Just camera calibration, rotation matrices, and the Mei fisheye model — running on commodity Android hardware.

https://github.com/HassanTarek-AppraidTech/Android-BEV-App

arXiv:2303.03651Jetson Nano · Android TVFB-SSEM DatasetAppraid Tech Project

Speaker notes

Hold this slide for 5 seconds. The GitHub link is in the video description. The docs/ folder contains the full algorithm flowchart, animated walkthrough, and this slide deck.