Ilya: So, today’s webinar is all about identifying mobile performance issues with Nimble, a HeadSpin company. And just to get started, we’re going to have an introduction by one of the founders of the Nimble technology and then we’re going to jump into the details of measuring performance within mobile apps and how to do that at build time. We’re going to talk about the product capabilities and go full live demo so you can see for yourself. Then we’re going to cover some customer case studies. And so, without any further ado, I’ll pass this over to Junfeng Yang. Are you with us?
Jungfeng: Excellent. Hello everyone. I’m Junfeng Yang, Chief Scientist at HeadSpin. Prior to that, I was the Cofounder and CEO of NimbleDroid, acquired by HeadSpin. Prior to that, I was a professor at Columbia University, worked at Microsoft and also got my PhD from Stanford University. But for all of these 20 or so years, my passion has always been building better tools to help engineers build software faster, and improving the quality of the software, so that’s basically my career passion.
Ilya: asked me to talk a little bit about the background of our product and how it came into existence. Around the year 2013, I had a bunch of students who graduated and got into the job market and started working in mobile engineering. They came back and complained that the mobile tooling ecosystem is so broken, and that there are so many different devices out there, that the frameworks that are still pretty young and fast-evolving, and that there just aren’t good tools around to help them figure out performance bottlenecks and improve the user experience.
So, my PhD, then PhD, student Younghoon and I started working on some very cool technology to automatically diagnose performance issues using very deep system techniques and also machine learning or AI techniques and the resulting system gained a lot of attention in the mobile developer community. That’s when we thought about creating a product out of the technology so that we can benefit our millions of mobile developers. That’s all part of how the technology the product, NimbleDroid, and the company. That’s the background. So yeah, Ilya, anything else that you wanted me to discuss?
Ilya: Thank you Junfeng, that was perfect as an introduction on where this came from. Thank you very much sir.
So Junfeng is a Columbia professor, so he’s in New York City and thank you for joining us.
So, let’s talk about the actual product as it exists today. So specifically, let’s start with performance. So, we’re talking about performance of mobile apps. They could be native apps. They could be hybrid apps. So, we’re basically talking about Android and iOS.
Check out: Client-Side Performance Testing - Metrics to Consider
And why is performance so critical for your business? Now, there’s not a ton of studies out there, but some of the key analyses done by folks like Amazon and Google – they basically agree that any time your users experience some delay in using the app, they’re unhappy.
Amazon provides 100 millisecond latency as a percent loss in sales.
Google’s talking about 500 millisecond latency is a 20% drop in user requests. From our customers anecdotally, we’ve heard that anywhere from 250 to 500 milliseconds is significant enough that users will perceive that.
So, if a user of your app is used to doing a certain function, you know, like, think of a shopping app where you’re going to search for a shirt, okay – or think of a finance app where I’m going to check my investment portfolio.
If I’m used to that screen appearing in a certain time and there’s a new release of the app, and all of a sudden it takes about 500 milliseconds more, that’s enough that it’s noticeable. Obviously the worse the delay, the less happy customers you have.
So why is that? You know, mobile performance is obviously complex and as Junfeng said, they looked at some of the reasons. So he named things like the ecosystem being complex, lots of different devices.
Here at Nimble as a product company, we’ve looked at what our customers are telling us and what we found is the three biggest reasons why there’s this complexity in understanding performance is:
- The developers are frequently making code changes. And so by the time you realize that there’s a performance impact, many commits have already happened. It’s very hard to track it down.
- The second problem is just testing for performance regressions, right? For a human tester to do some manual testing, they won’t necessarily know that an app got 300 milliseconds slowdowns. But if you think about three subsequent releases with 300 millisecond slow downs, now we’re talking about one second of slow down, which clearly is perceivable by your users.
- And the other problem, it’s not just for performance, right? This is sort of a generic thing in software development, but software development, especially these days, means you’re using a lot of third-party code: SDKs, libraries developed by open-source developers or by even other folks in your team, right? So there’s this poor visibility of how the performance of the SDKs or third-party libraries are impacting the overall app performance.
And we actually have some examples of each of these three things to show you. So the other thing that we struggled with a lot is just tools, right? There’s lots of tools on the market and so they all do different things, and they are better suited for certain things than others. So, APM tools for example, they’re great. They give you a lot of visibility into what your real users are doing. But when it comes to actually detecting issues, it’s too late in the cycle, right? The app is already out and real users are encountering those issues. So, if you think about the common term these days, it’s shift left.
Also see: Improving Mobile Application Quality Through Performance Testing
If you want to shift left on performance, you can’t wait until your users are experiencing some of these issues. Then the other thing we talked about already is just accurately measuring the performance. When you’re talking about a few hundred milliseconds of difference, it’s very hard to do that reliably because if you open the same app a couple of times on whatever device you happen to have, it’s not going to take the same amount of time.
So, identifying where performance is worse versus just a glitch or a network problem is kind of a big deal. And we can show you what that looks like.
And then of course Android and iOS have very different tooling and very different ecosystems and come with their own challenges. So there’s that element of it and there’s a lot of costs in trying to do this yourself in terms of getting devices that are optimized for this kind of testing specifically for performance and getting people who are trained to look at performance issues.
One of the things that we encounter commonly with our customers is that many organizations do not have a dedicated performance team, although some do. And it’s really a question of both people and time and engineering hours and dollars ultimately on how you can easily test your apps for performance and why you should.
So, the solution that Nimble App put together is meant to provide visibility and control into the performance of a given mobile app in the development stage. We basically integrate with CII so we can continuously monitor every build so we can quickly alert when there is a regression as soon as some problematic code is introduced. It becomes a lot easier to go back and understand what commit has caused this problem.
And then the other thing which I’ll show you today is the ability of our product to provide these fine-grain diagnostics really helps developers pinpoint exactly which method or what part of their code is causing the slow down so they know where to look or where to start improving things and they can understand what kind of performance impact that’s going to have.
And so, the idea again is to shift left and to be able to identify issues earlier in the cycle. And that’s what we’re talking about today.
Now typically, our users use our product in a few different ways:
- One is a developer may want to just check their latest build against performance numbers and they can do that. The typical use case is CI integration where every time there is a pull request and a new build gets generated, it gets profiled, and we can detect regressions in performance.
- Also, some of our customers use it as a release criteria. So, they set up an actual budget of what performance numbers are acceptable for certain typical UI interactions, and then they can check against those numbers.
- And then finally, in production when you’re talking about a customer reported issue, you can actually rerun it through Nimble and get some results to look at.
But as I said, the typical integration that we offer is a very, very easy and seamless integration into the CI workflow. So you can see some logos here of some popular CI systems. We actually integrate with any CI system – we haven’t met one yet that we cannot integrate with. And it’s a very, very simple process. And as I explain further what the performance metrics look like, you’ll see for yourselves. It’s a very simple thing. So, that’s what we’re going to talk about today.
So, I’m going to actually switch to a live demo and go through some of the scenarios so that you can see for yourself what our product looks like.
And again, for those folks who have joined us since the beginning, just a reminder, if you have a question, please use the questions tab on the right hand side – you can actually put your question in. And what we’re going to do is address them at the end of the Webinar. Please go ahead and do that if you have any questions.
Okay. So, let’s talk about a few specific things that we’re going to look at today. So, identifying slowdowns in third party code is very important. It’s one of the three main challenges that we found.
And here’s an example from a real customer we have: this is actually from Aaron’s personal blog, but Aaron is an engineer that works for a company that is one of our customers. And what he’s calling out is essentially that get resources stream could take up to two seconds to create this initializer. And he’s basically pointing out that there’s a very simple fix – you can just make the adapter lazily initialized and it saves anywhere from one to two seconds at a startup time.
So let me actually show you what Nimble does and how it shows these kinds of issues so you can see for yourself. So, this is the first thing we’re going to do.
So, as I mentioned, Nimble will go ahead and integrate with your CRM system and every time there is an upload of the app, Nimble is going to analyze that app.
So, this first step that I’m showing you is a shopping app. So, one of the main things that we look at, honestly, one of the first things that any organization that starts caring about performance is going to look at is the cold startup numbers and cold startup is literally from when you launched the app until the App UI is usable by your users. And so, each of these dots represents a specific build taken from probably their release branch.
Also check: UI Testing - A Complete Guide with Checklists and Examples
And you can see the results in terms of how long it took to get to the app and make it usable. And other scenarios over here, like adding an item to the shopping cart, for example, or fetching some recommendations – these are custom test scenarios, which either we, Nimble, or our customers themselves can build. And we’ll talk a little bit more in detail about the test frameworks that we support.
But the idea again is every time there’s a new build, we can get these performance numbers. And so what that allows an organization to do is as soon as there’s a performance regression, it is connected to the previous build and it makes it very simple for developers to understand what exactly is causing that regression.
In fact, if we expand this out to say all 22 uploads and reload the dashboard, you’ll see for yourselves basically for cold startup, the numbers are fairly similar, fairly similar, fairly similar. They’re not exactly the same as the code changes come in and out, but they’re very minimal regressions or fixes.
And then suddenly cold startup jumps to 2.5 seconds and then even 2.6 seconds. So each of these two data points would have generated an alert to the team. And when the developer looks at this, it makes it very simple to understand what exactly regressed because I can simply click on this and say, compare this upload with the current.
And by the way, you can compare any two arbitrary points, but the typical use case is comparing the current or the next build – the latest build – with the previous builds. And so that’s what we see here. So, the app didn’t change too much. If you look at the file size or the method count, they’re very similar. But the timing of the cold startup definitely regressed and went from 1.13 seconds to 2.5 seconds.
And so, as I keep saying, Nimble makes it very simple to see why the code regressed. And there’s two ways we do that.
1) One is we actually identify specific reasons for slowdowns. So, in this case we’re looking at three different things.
- One is anything that’s on the UI thread, which is going to actually block or hang the UI.
- Then we’re also able to find hung wait methods, which basically means a method on the UI thread is waiting for some background thread to finish.
- And then the last thing is basically any method which runs in the background, but for longer than 100 milliseconds.
So, if you, again, if you remember that the left side used to be fast than the right side is where things got slow – it’s very clear that the splash activity on create exists in both cases and the timing is about the same 254 versus 253 milliseconds.
But there’s a new main activity on create which was added here and it’s very long. And specifically, if you look at the call stack for this, the main activity on create method is going to call the main activity initialize method, which itself is 2100 milliseconds.
And if you want more information than that, we can actually pull up the full call stack for the entire application like this. And so again, what you see is the UI thread on the left for example, was very quick, 312 milliseconds. And I can just zoom in and show everything that happens as part of the UI thread. And here it’s almost 2.5 milliseconds. And you can see that the biggest method or the method that takes the longest time to run is as big integer add method.
So, what this boils down to is a developer made a code change. They submitted their build, we realized that there’s a performance problem, and by looking at the previous build versus this build, it’s very clear that this new method which was added is in fact the method that is causing the problem.
And so now developer can go back and change this method, get rid of this method, rewrite the code, whatever is the appropriate remediation, but they understand that this method and this timing information is the biggest problem here. There’s other methods being called by the onCreate method, but they’re very quick, so they’re not causing us any problems.
We can also pull up a timeline view – not really necessary here. Plus of course this is a demo, so not a real customer example.
But the timeline view basically shows us the threads and it also shows us when certain threads are launched, and if the UI thread is waiting on something to finish, it shows that as well.
So here the UI thread is actually waiting for a couple of items from a background thread to finish. And this is causing some delays as well. So to summarize, a typical use case for Nimble is you tie it into CI. Every time there’s a new build, it gets profiled for cold startup as well as any kind of custom tests. That information is communicated to the team, so that as soon as there is a performance regression, then we can communicate that to the users.
Now to further explain what’s happening here, actually one of the main things that we’ve discovered is if you look at performance and you run the app a couple of times on a couple of different devices, you will get very different readings.
So, another thing that Nimble offers, and it’s really the foundation of our platform, is these thoughts or these numbers are very reliable, run over run. And the reason that they’re so reliable is, well there’s two reasons.
1) One is we have actual physical real devices in a farm which are used to install the app, run it and profile it. And those devices are all exactly the same. So if you think of a typical device farm, it may have lots of different devices and that may be useful for answering the question of how is my app doing on this device versus that device.
But for this kind of build over build performance testing, in order for it to make sense, you have to have exactly the same situation and this build as well as that built. Otherwise the numbers won’t match up on the trend graph.
And so Nimble provides that by having a device farm comprised of exactly the same devices, configured in exactly the same way, sitting on the same network pipe in the same location so that when you submit a new build, if the numbers are different as they are, in this case, there’s only two possibilities:
- Either you have network traffic that took a different amount of time
- or you’ve made a change in the code and we can show you that.
And in fact if you look at the timing information on the call stack, it becomes very easy to tell whether this is network related or CPU related, because what we’re showing here is the CPU timing for how long each method takes to execute. So if these two were exactly the same, then the issue would be on the network side. If these two are drastically different, then clearly the issue is in the code itself.
So another thing that Nimble provides is the functionality for iOS. Now, iOS, as I’m sure anybody who’s worked on iOS knows, Apple has some curveballs when it comes to third party tooling and running various things. And so we’ve basically dealt with all of that. So we also have a device farm of iOS devices and we can do exactly the same thing in iOS that we do for Android.
So, as you can see here, it’s sort of a safe shopping app, but in an iOS environment and I just want you to see that it works exactly the same. Every build comes in, it gets profiled. We can certainly do a “compare this upload with current” – so you can see the difference. I actually compared this upload with current, sorry about that.
So you see that in this case this is an IPA that’s being run and there might be some differences in this one and the timing went down. And so again, you can pull up a call stack and see what the differences are side over side. And so it makes it very easy to see this in iOS as well as Android.
Recently, we’ve had some requests for web apps. And so this is a demo which we do on Kohl’s. So all of these are basically pulling up the Kohl’s website inside a mobile browser. And again, we can basically profile it for timing information and show when there are differences. And when you look at the details for this, you can see that we’re also identifying slowdowns.
In the case of a web app, they’re basically JavaScript functionalities. In the case of Android, they’re Java code and the case of iOS, it’s either Swift or Objective C.
But basically by looking at the call stack and looking at the method timing, whatever the languages of each platform, we can identify these slowdowns. And more importantly, we can profile this and give these a relative number from one run to another.
Another thing that commonly happens is when a customer wants to look at improving the functionality of a certain app, they can look at the results that we provide, not by a build over build dashboard, but by actually looking at a unique set of results. And so I’d like to show you what that looks like.
This is an app that I pulled from the app store. So this is just one of the, not the latest, but one of the recent versions of this app called Runtastic. We grabbed the APK from the Google Play Store and we’ve analyzed it.
One of the things that you can see here is a login that takes 4.6 seconds, and arguably if I wanted to improve that time, the question naturally is: what’s causing it to be long and where should I focus my efforts?
What Nimble does, as you can see here, is we’re basically starting when the user clicks the login button and we’re ending when the actual logged-in screen or a version of the app is fully populated. This is driven by an automation test, which in this case we wrote. There’s 10 slowdowns and we can look at the actual details and see the actual methods. These are on the UI thread if you recalled because these are the ones that are actually capable of hanging the CPU and hanging the UI thread.
These are background methods. And so looking at this, we can say, if you want to start working on improving this particular app and this particular flow specifically, this is what you’re doing. This is the thing that is the path in the code that is taken.
If you want it to start improving this app’s performance for login, this is where you would focus. Here’s the full call stack so you can see for yourself what’s involved in making these calls.
For example, here we’re pulling in some data from maps and this is on the Ui thread and it’s almost a hundred milliseconds, right? So if we can offload that to a background thread, that would be great, and that would cause a performance improvement here.
But there’s some other interesting things we can see here.
We talked a little bit about third party code. And the thing about third party code is, whether it’s open source or in a large organization – maybe it’s a different team that’s creating some capabilities – traditionally, it’s hard to understand if the app performance is due to “my code” or “somebody else’s code.”
In this case, this is how easy Nimble makes it. So this is Gson. It’s an open source library for parsing Json. And in this scenario, which if you recall, is about four and a half seconds, almost a second is actually spent parsing Json using this Gson Library. So that’s kind of a strange thing. You would expect this to be very quick.
So again, if we go back to the original slowdowns, we can actually trace it down that way. So I’m looking at some of these methods. You can actually see where that Gson is taking place.
Here’s a great example of that and I’ll just go to the beginning of it up here. This is a background thread and it’s running in the background. So that’s not directly going to freeze the UI thread, but obviously it has to finish processing in order for the app to be useful.
And this is what it’s doing – you can see that Gson is being called and the reason that it’s so slow, relatively speaking, is because Gson uses reflection and on Android, reflection is actually incredibly slow.
In fact, we have a couple things further in this webinar that explain that a little bit better. We also provide you some links that you can look at.
But in general, the Gson Third Party library is causing some slow downs here. And I can track down into the code and see that this is due to this reflection. But the thing I find most interesting is actually, if you look at the name here, com.newrelic.agent – so not only am I identifying a problem in the third party library, which is Gson, but the really interesting thing is it’s actually being introduced not even by the native apps functionality – it’s actually because they have New Relic and New Relic is creating some slow downs.
Check How to Design An Automated Functional Testing Strategy
I mentioned this earlier, New Relic is an APM tool. They provide a lot of useful information. So it may be the case that they’re willing to accept that slow down because it provides so much useful information and that’s great.
But what you can provide to New Relic itself now is a break down and actually say “Hey, we love your product but you’re causing us some problems. Is there anything you can do to speed this up?”
And New Relic would actually be wise to change out of Gson because there are other libraries for parsing Gson and in the call stack, it’s actually very clear that this happens a lot.
So here’s a background thread with a little over one second of runtime. And ultimately if you look at this, this is opening up some web server requests, which ultimately is going to call New Relic agent, which is ultimately going to call this to Json method, which is 300 milliseconds here. And then guess what, this is another New Relic call that’s 358 milliseconds over here and yet again over here for another 200. So, you see how this is all added up. And even though this is the background thread, this is actually responsible for almost the one second of CPU time.
So that’s the kind of thing that you can get from Nimble. Just to summarize, you plug it into CI or you could do it as a one-off build. We talked about both of these scenarios and what it gives you is this level of detail so that you can either work on improving the performance of a certain task or you can track it, build over build, then make sure that it does not regress.
In fact, a lot of our customers will talk about creating a baseline and then just making sure that nothing regresses or gets slower from that baseline. So there’s a couple of different ways of using this. And, that’s kind of the live demo.
I’m going to go back into the slides and cover the customer case studies and such. And again, if you have any questions or if you want to look at anything else, by all means, please put the questions into the question panel.
This is an example of identifying a third party problem in Moshi Kotlin and some explanation there.
This is the Gson example that I already showed you.
I have another case study here. This is from one of our customers who did not agree to share their name publicly, so I had to redact some of it, but this is a real world example where we’re looking at three different versions of an iOS app, very popular one.
It basically has this huge spike which was later fixed. And so this is the interesting one. And again, if we compare the before and after, it’s pretty clear to see that on the left hand side, the hot methods are 300 milliseconds, 280 milliseconds, On the right hand side, there’s some database access going on and it’s just very slow.
This is the kind of thing you can see with Nimble. This is an iOS app and this is what the full call stack looks like. And again, you can see under the UI application main, went from 2000 milliseconds- so from two seconds to seven seconds, that’s a five second change and believe me, five seconds is very perceivable to a user. I use this app. I’m sure a lot of you have to.
This is the sort of thing that we can identify with Nimble and this is how it makes it possible for a developer to look at that and say, okay, I get it. I need to focus on changing this method, which ultimately will cause this – ultimately, it’s either this guy or this guy that’s going to be my two biggest contributors and that’s what happens.
So that’s one real example. But we have some case studies here.
So Flipkart is a great customer of ours – been using us for over two years now. They use this exactly the way I described. It’s built into CI. They analyze every build and whenever we detect the regression for them, they use Nimble to analyze what the issue is. You can see they fixed it in two or three builds, improved it and made sure that it goes back to where it should be.
Flipkart is a big supporter because we deliver a lot of value to them. Of course, Flipkart has many customers. As a sales app, people tend to leave things in the shopping cart. People tend to get mad and go use something else. So that’s a big thing for Flipkart.
Here’s another [case study] – they didn’t let us use their name, but this is a very large company with lots of different teams.
What we found is when you look at how frequently that many engineers make code changes, the sooner you can find a problem, the better it is. After six months, we basically filed these credentials. If you think about the engineering time that’s necessary to go back and understand why the performance changed – the more commits you have to wade through, the more individual code changes you have to analyze – the harder it’s going to be.
Being able to look at it on a build by build level where the change is pretty small is much better than being able to only look at version to version comparison.
This is a great case study of New York Times – did this test with us. They wanted to improve their cold startup. Google recommends two seconds or less for cold startup. As you can see here, theirs was hovering at 4.3 and then they started working with us to improve it. Actually, it was even 5.6 initially.
So there is a link here – I’ll give you the URL at the end of the Webinar and we’ll obviously provide that with our follow-up, but they have a whole blog where they explained exactly what kinds of issues they found. This actually happened almost two years ago, but it’s such a great case study.
For me as a customer facing engineer, what’s interesting is that even though this happened two years ago, I go to customers today and I see the same kinds of things and just a spoiler: Gson and reflection is a big reason for why they had this slow startup time, which they fixed.
This blog post explains exactly why and how they fixed it. As you just saw from the latest Runtastic version, people are still hitting those problems on Android today. So, this is not something that went away – even though this is this particular case studies two years old, it’s actually still entirely valid based on what I see in the field. We’ll share the link at the end of the Webinar and we’ll follow up with that. I recommend you take a look at that. If you have any Android development.
The other case study – a little bit newer, also a great blog post – which you can read from our friends at Pinterest. They actually use us in a performance team and we are a big part of providing them a platform to run Android tests and iOS tests and identify performance issues early.
They have this nice blog post where they explained some of the things that they’ve seen with us and what kinds of issues they’ve been able to find and fix. And again, we’ll share the link and here they are: insert URLs here.
The three things that I showed in the webinar, these are the URLs. If you want to take a look at them and I do recommend you guys take a look at it, we’ll share these links with you along with the recording.
At this point, just to summarize, Nimble enables our customers to accurately monitor and profile every critical user flow of Android and iOS apps, and now even web apps. We integrate pretty seamlessly into your existing workflow, which means it’s a set and forget kind of thing. You put it into CI and then it just works.
FAQs
Q: What are the key parameters to consider during performance testing?
Some of the critical elements of performance testing are
- Memory usage
- Network Bandwidth
- Network queue and latency
- CPU interruption
- CPU usage
- Thread counts
Q: What is Performance Regression Testing?
Performance regression testing determines how an application performs compared to previous versions. This test sets a baseline performance and ensures that any changes in the application do not create new performance issues.
Q: How is client-side performance testing different from traditional performance testing?
Performance testing traditionally focused on the response from the server with tests like load tests, stress tests, spike tests. While extremely useful, traditional performance testing does not test the behavior of an application on the web browser or a mobile application.
Client side performance testing complements the traditional performance testing to provide an end-to-end performance. It includes testing the effects on web+mobile components like JavaScript, images, videos, CSS, battery, location features, networks on applications performance.