Skip to content

Android testing — types, trade-offs, and the SessionClick plan

Android development has several distinct test types, each with different speed/fidelity/cost profiles. This article catalogues what exists, what each type is for, and — concretely — what SessionClick uses today and what it deliberately defers.

The test pyramid

A healthy test suite has many fast unit tests at the base and few slow end-to-end tests at the top. Inverting this is a classic pathology: suites take hours and still miss logic bugs.

flowchart TB
    subgraph E2E["End-to-end — slow, flaky, high confidence"]
        E["Maestro / Appium<br/>full user flows on a real device"]
    end
    subgraph UI["UI tests — medium speed, medium confidence"]
        CU["Compose UI tests<br/>createComposeRule()"]
        ES["Espresso<br/>View-based (legacy)"]
        SC["Screenshot tests<br/>Paparazzi / Shot"]
    end
    subgraph INT["Integration — on device or emulator"]
        IT["Instrumentation tests<br/>androidInstrumentedTest<br/>real Android runtime"]
        RO["Robolectric<br/>JVM + fake Android runtime"]
    end
    subgraph UNIT["Unit — fast, cheap, numerous"]
        CT["commonTest (KMP)<br/>kotlin.test, runs on every target"]
        AU["androidUnitTest<br/>plain JVM + JUnit / kotlin.test"]
    end

    UNIT --> INT --> UI --> E2E

Unit tests (JVM-local)

Location in a plain Android project: src/test/. In a KMP project: src/commonTest/ for platform-agnostic logic, src/androidUnitTest/ for Android-only JVM tests.

  • Runtime: plain JVM on the developer's machine. No emulator, no Android runtime.
  • Speed: milliseconds per test.
  • Tools: kotlin.test (multiplatform), JUnit 4 / 5 (Android-only), Kotest, Turbine for Flow and coroutines, MockK or hand-rolled fakes for test doubles.
  • Good for: pure functions, domain logic, data transformations, validation, migration, state machines.
  • Bad for: anything touching Context, real Room databases, the Android SDK, or the Compose runtime.

Instrumentation tests

Location: src/androidTest/ or src/androidInstrumentedTest/ in KMP.

  • Runtime: real Android runtime on a device or emulator.
  • Speed: seconds per test, plus a ~30-second emulator boot cost per run.
  • Tools: AndroidX Test, ActivityScenario, ServiceTestRule, Room's in-memory database helper.
  • Good for: Room DAOs, SharedPreferences, Service lifecycle, WorkManager — anything that genuinely needs the SDK.
  • Bad for: pure logic (wasteful — use unit tests).

Robolectric

Robolectric runs on the JVM but provides a fake Android runtime — fake Context, Resources, Activity lifecycle. Faster than the emulator (seconds, not minutes of boot) but the fake sometimes diverges from real Android, masking bugs or producing false positives.

Use case: needs Android SDK access without emulator boot cost, and the fidelity trade-off is acceptable. Modern Android projects tend to avoid Robolectric for anything safety-critical and use real instrumentation tests instead.

Compose UI tests

Location: src/androidInstrumentedTest/ — needs a device or emulator.

Compose UI testing uses createComposeRule() plus matchers like onNodeWithText, onNodeWithTag, and actions like performClick, assertIsDisplayed. Tests render a single composable or screen in isolation and verify rendering and interaction.

  • Cost: much faster than Espresso but still instrumented. Maintenance burden is real — UIs change often.
  • Good for: critical flows (authentication, payment, checkout).
  • Bad for: everything. Many small teams skip these entirely and rely on screenshot tests or manual QA.

Espresso

Espresso is the classic Android UI testing framework for View-based UI. Largely superseded by Compose UI tests for greenfield projects. Mentioned here for context — encountered in older codebases.

Screenshot tests

Paparazzi (runs on JVM, no emulator) or Shot. Renders a composable, saves a PNG, diffs against a reference PNG committed to the repo.

  • Catches: unintentional visual changes (font weight, spacing, colours).
  • Trade-off: deliberate UI changes require regenerating screenshots. Noisy on a fast-moving UI.
  • Sweet spot: after the design stabilizes, for a handful of key screens.

End-to-end tests

Maestro (YAML flows, simple, widely adopted in 2025–2026) or Appium (more powerful, more complex). Scripts a full user journey on a real device.

  • High cost: slow runs, flakiness when animations or timing vary, device farm required at scale.
  • High value for release-gate smoke tests and reproducing customer-reported bugs.

KMP specifics

In a Kotlin Multiplatform project, tests live in source sets that mirror the production source sets:

Source set Runs on Typical content
commonTest every target (JVM, iOS, etc.) Pure logic with zero platform dependencies. Highest-ROI test location.
androidUnitTest Android JVM Android-only logic that doesn't need the SDK (rare if commonTest is used well).
androidInstrumentedTest Android device / emulator Real Android SDK usage: Room, Service, etc.
iosTest iOS simulator iOS-specific logic.

Every test in commonTest runs automatically on every platform the project targets. Write once, cover Android and iOS simultaneously. This is the reason to aggressively keep domain logic in shared/commonMain — the tests follow for free.

The SessionClick plan

What is in use

Heavy investment in commonTest:

  • MigrationTest.kt — v1 → v2 schema migration (six cases: dedup across playlists, case/whitespace normalization, BPM-as-identity, special-item passthrough, empty input).
  • SessionStateTest.kt — 14 tests covering selectedIndex arithmetic across removeItem / moveItem / restoreItem, cascade delete from the song pool, cross-playlist propagation via pool references, bulk-operation notification semantics.
  • FakeAudioEngine — a test double implementing the AudioEngine interface. Available for verifying ViewModel ↔ engine interaction without real audio hardware.

These run on Android and iOS targets. See Gradle test commands below.

What is deliberately skipped

Test type Reason
androidInstrumentedTest After the SessionState extraction, SessionViewModel is a thin Compose adapter with no domain logic of its own. MetronomeService is small and stable — easier to verify by running the app. Revisit if the Service grows.
Compose UI tests High maintenance cost for a UI still changing weekly. Reconsider before 1.0 release for critical flows (Start / Stop, BPM display).
Screenshot tests UI still shifting — screenshot regeneration would be constant noise. Revisit once the design stabilizes, around beta.
Maestro / E2E Setup cost not justified at this stage. A smoke test for "add song → start → stop" becomes worthwhile before 1.0 release.
Audio timing Frame-accurate scheduling cannot be unit-tested — the clock is real hardware. Verified empirically on device: ø 0.15 ms / max 0.26 ms jitter at 120 BPM.

Pyramid, SessionClick-shaped

flowchart TB
    CT["commonTest<br/><b>heavy investment</b><br/>migration, state, persistence, rules"]
    AU["androidUnitTest<br/><b>minimal</b><br/>nothing Android-only remains with logic<br/>after kotlinx.serialization lands"]
    IT["androidInstrumentedTest<br/><b>skipped for MVP</b>"]
    UI["Compose UI tests<br/><b>skipped for MVP</b>"]
    E2E["Maestro E2E<br/><b>skipped for MVP</b>"]
    MAN["Manual device testing<br/><b>primary UI verification method</b>"]

    CT --> AU --> IT --> UI --> E2E
    CT -.covers most logic risk.-> MAN

Solid = actively used. Dashed = deliberately deferred. Manual device testing is a first-class verification method for a small solo project with rapid UI churn.

Running the tests

All commands run from the project root (/Users/kekiel/AndroidStudioProjects/SessionClick).

./gradlew :shared:allTests
Runs commonTest on every configured target (JVM, iOS simulator, etc.).

./gradlew :shared:testDebugUnitTest
./gradlew :composeApp:testDebugUnitTest

./gradlew :composeApp:connectedAndroidTest
Requires a device or emulator running.

Reports are written to build/reports/tests/ as HTML.

When to revisit this plan

Trigger points for adding more test types:

  • Before 1.0 release — add a Maestro smoke test covering the golden path (open app → add song → start metronome → stop → close). Catches release-blocking regressions.
  • After a production bug — write the test that would have caught it, regardless of which layer it belongs to. The pyramid grows organically through reality, not speculation.
  • When a collaborator joins — UI tests become more valuable when the UI isn't in one developer's head.
  • Before a large Compose refactor — snapshot-test the affected screens first so visual changes are detectable.