Last quarter, our app's cold start time was sitting at around 1,800ms on a mid-range device. Users were complaining. Play Console vitals had us in the "slow" bucket. Something had to change.
After three weeks of profiling, identifying bottlenecks, and shipping fixes, we brought it down to 720ms — a 60% reduction with no features removed and no architecture changes. Here's exactly what we did.
Step 1 — Profile Before You Guess
The worst thing you can do is assume you know the bottleneck. I've seen engineers spend days optimizing a RecyclerView adapter when the real problem was a content provider blocking the main thread on startup.
I used two tools:
- Android Studio's App Startup profiler — gives you a breakdown of time spent in Application, content providers, and first Activity
- Macrobenchmark — measures real cold start times with
StartupMode.COLDand reports percentile timing
@RunWith(AndroidJUnit4::class)
class StartupBenchmark {
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun startup() = benchmarkRule.measureRepeated(
packageName = "com.bytevikas.app",
metrics = listOf(StartupTimingMetric()),
iterations = 10,
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
}
}
Running this revealed two major culprits that together consumed 900ms of our startup budget.
Bottleneck 1 — A Content Provider Nobody Knew About
Firebase and several other SDKs register ContentProvider classes that auto-initialize on app start. Each one runs on the main thread before your Application.onCreate() even fires. We had six of them.
The fix is App Startup library — it collapses all these into a single content provider and lets you defer initialization:
<!-- AndroidManifest.xml -->
<provider
android:name="androidx.startup.InitializationProvider"
android:authorities="${applicationId}.androidx-startup"
android:exported="false">
<meta-data
android:name="com.google.firebase.components.FirebaseComponentDiscoveryService"
android:value="androidx.startup" />
</provider>
For Firebase specifically, we deferred it until after the first screen painted. This alone saved us 380ms.
Bottleneck 2 — The DI Graph Was Too Eager
We were using Hilt and had our entire dependency graph — including network clients, database instances, and repositories — initialized in @Singleton scope during app startup. None of this was needed on the splash screen.
The fix: move to lazy initialization. Hilt supports this natively via Lazy<T>:
class HomeViewModel @Inject constructor(
// Before: eagerly creates everything at injection time
private val userRepository: UserRepository,
// After: actual object created only when first accessed
private val analyticsService: Lazy<AnalyticsService>,
private val syncManager: Lazy<SyncManager>
) : ViewModel()
Rule of thumb: anything that isn't needed on the first frame the user sees should be Lazy<T>. Sync managers, analytics, crash reporters, notification handlers — all deferred.
Bottleneck 3 — No Baseline Profile
AOT compilation on Android is profile-guided. Without a Baseline Profile, the JIT compiler has to interpret bytecode at runtime for every method it hasn't seen before. The first launch is always the slowest as a result.
Adding a Baseline Profile tells the system to AOT-compile the critical code paths during app install:
@RunWith(AndroidJUnit4::class)
@LargeTest
class BaselineProfileGenerator {
@get:Rule
val rule = BaselineProfileRule()
@Test
fun generate() {
rule.collect(packageName = "com.bytevikas.app") {
pressHome()
startActivityAndWait()
// Walk through critical user flows
device.findObject(By.text("Home")).click()
device.waitForIdle()
}
}
}
Pair this with ProfileInstaller in your app so the profile is installed during first launch on Play Store installs. This gave us another ~140ms reduction on supported devices (Android 9+).
Minor Wins That Added Up
- Removed a
StrictModecall that was accidentally left in non-debug builds - Replaced synchronous SharedPreferences reads on startup with DataStore (async)
- Moved a network status check (with a 200ms timeout) to post-render
- Cut splash screen drawable complexity — a complex animated vector was taking 80ms to inflate
What I'd Do Differently Next Time
Add startup timing to CI from day one. We should have caught the content provider issue months earlier — it was added by a dependency update and nobody noticed because we had no automated startup regression tests. A Macrobenchmark run on every PR would have caught it immediately.
Also: profile on a real mid-range device, not the emulator or a flagship. Pixel 8 Pro numbers will lie to you. Test on a device in the P50 tier — that's where your actual users are.
No comments yet. Be the first to leave one!