Saturday, November 26, 2016

Unable to find native library / findLibrary returned null

Ever since doing Android development in 2012, I've been getting crash reports that showed my game was failing to load the shared object library that contains my native code.

This could manifest itself as a:

Unable to load native library: /data/data/com.sample.teapot/lib/libTeapotNativeActivity.so
or sometimes a findLibrary returned null:
E/AndroidRuntime( 7344): java.lang.UnsatisfiedLinkError:
  Couldn't load /data/data/com.sample.teapot/lib/libTeapotNativeActivity.so: findLibrary returned null

It turns out there is a trick to have android give up more information about it. What you do, is load the library yourself in onCreate() of your native activity class. For this you use the System.load() call (and not the System.loadLibrary() call that the stackoverflow answer suggests.) Unfortunately, you need to know where your test device has actually placed the .so file, but if you do something like this:

public class TeapotNativeActivity extends NativeActivity {
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        System.load("/data/data/com.sample.teapot/lib/libTeapotNativeActivity.so");
        ...
You will get a proper error in the logs:
W/native-activity( 7426): onCreate
D/dalvikvm( 7426): Trying to load lib /data/data/com.sample.teapot/lib/libTeapotNativeActivity.so 0x410d97a8
W/dalvikvm( 7426): threadid=1: thread exiting with uncaught exception (group=0x40a741f8)
E/AndroidRuntime( 7426): FATAL EXCEPTION: main
E/AndroidRuntime( 7426): java.lang.UnsatisfiedLinkError: Cannot load library: link_image[1936]:
  81 could not load needed library 'libGLESv3.so' for 'libTeapotNativeActivity.so' (load_library[1091]: Library 'libGLESv3.so' not found)

Hooray! Now we know which dependency is missing. Note that in this specific example, it is of course an issue of not specifying the ES3 requirement in the manifest. But this example is for illustration purposes only.

Originally, I was convinced that these errors were a case of not finding the library, because Android decided to append a "-1" or "-2" to the library name. But this was a red herring. The library is there all right. The thing is, it has dependencies that are not met, and are not reported when it fails loading.

By the way, to check if your library is really there, you could issue a command like:

$ adb shell run-as com.sample.teapot /system/bin/sh -c "ls\ -al\ /data/data/com.sample.teapot/lib/"
-rwxr-xr-x system   system     424232 2016-12-26 08:53 libTeapotNativeActivity.so

Thursday, November 24, 2016

Running a fast Android Emulator on Linux with kvm.

You can test your Android apps on an Android Virtual Device, instead on actual mobile hardware. If you emulate an ARM Android device, the performance will be low though. For faster emulation, it is better to emulate an x86 Android device.
Step 1
Make sure your Intel CPU is capable of virtualization, and go into your BIOS settings to make sure the virtualization feature of your CPU is enabled. You can also check from your command prompt by install cpu-checker package, and then run: sudo kvm-ok on the command line. You should see the report 'KVM acceleration can be used.'
Step 2
Install the correct parts of the Android SDK. Run on the command line: 'android' and install for your desired target (I use API level 19) install the Intel x86 Atom System Image.
Step 3
Start the AVD tool. On the command line, execute 'android avd' to open the tool.
Step 4
Select Create Device. Make sure to select Target: API level 19, and set the field CPU/ABI: Intel Atom (x86) as well. If you are using Google APIs, like Google Play Services, you should select the ABI: Google APIs Intel Atom (x86) instead. Don't forget to tick the Use Host GPU checkbox.

Step 5
Launch the virtual device by selecting 'Start' on the AVD tool.
Step 6
See if adb finds your virtual device with: 'adb devices' on the command line. Note the serial number for your device.
Step 7
Install your apk file using: adb -s emulator-5554 install bin/yourapp.apk
Enjoy the super fast emulation performance!
KNOWN ISSUES
If my app requests an OpenGL ES3 context, it fails:
E/EGL_emulation( 2463): tid 2476: eglCreateContext(919): error 0x3005 (EGL_BAD_CONFIG)

Final PRO-tip: Targeting API level 19 is your best bet, because running higher level APIs can cause crashes on older devices. To fix this, Google recommends actually shipping multiple APKs no less! Read more on this stackoverflow thread.

Sunday, November 20, 2016

Shallow Depth of Field in Little Crane.

A year ago or so, I added shallow depth of field rendering to my render engine for Little Crane 2. Because the details were getting vague, I had to refresh my mind. To do this, I've charted the flow of my render steps. I call them steps, not passes, because most of them involve rendering just four vertices for a full screen quad. The actual passes are a shadow-map generation pass over the models (not shown) followed by the standard render pass (step 0) that renders into an off screen buffer.


Step 1, 2, 3 and 4 all render into their own off screen buffer, and the final step 5 renders to the screen/window.

In step 1, the input image is copied but in the alpha channel, a Circle-of-Confusion size is stored. To calculate this size, the depth of the pixel is compared to the focal distance. This CoC size is a signed value which controls the blurriness. The value is 0.0 when it is perfectly in focus, it is -1 if it is in the foreground and requires maximum blur. When it is +1, it is also maximally blurred, but this time because it is in the far background. Intermediate results are for moderately blurred pixels.

In step 2, we simply down sample the input to half resolution (this increases blur sizes, and saves render cycles. Not shown is that it also contains the down sampled alpha channel with CoC values.

In step 3 and 4 we blur the input with 7 taps. First in a horizontal direction, and later in a vertical direction. This smears out the pixels over a 7x7 (49) pixel area with only 14 samples. The blur algorithm requires some thought though: the weights of the samples need to be adapted to the situation. This is because foreground pixels can be blurred over pixels that are in focus. But background pixels cannot be blurred over the in-focus pixels, because the blurred background is occluded by the objects in focus.

In step 5, we simply mix the blurred and the sharp versions of the image based on the circle-of-confusion value. If the pixel is in focus, no blurred pixels get mixed in. If the pixel is in the far back or in the far front, only the blurred pixel makes it to the screen, and the value of the sharp pixel is mixed out.

Wednesday, November 2, 2016

Cache Friendly

In my upcoming title Children of Orc the largest computational load is from the GOAP action planner that the NPCs use. I found that simple plans would be found in a millisecond. But larger plans would really tax the CPU. Especially if no plan exists to reach the goal state, a lot of cycles would be consumed. Typically a few seconds on a mid range corei5 CPU. This means it would be prohibitively expensive to do on a mobile device, if I were to port the game to iOS or Android.

I profiled it with linux command line tools, as well with Xcode's Instruments tool. Both showed that the cycles were spent in a linear search for the lowest cost node. The search would iterate though up to 32K nodes to find the lowest 'f' value in an array of these structures:


struct astarnode
{
        worldstate_t ws;                //!< The state of the world at this node.
        int g;                          //!< The cost so far.
        int h;                          //!< The heuristic for remaining cost (don't overestimate!)
        int f;                          //!< g+h combined.
        const char* actionname;         //!< How did we get to this node?
        worldstate_t parentws;          //!< Where did we come from?
};

Examining the assembly output of the linear search, I could not find anything wrong with the compiler's code. The only bothersome issue was that all the 'f' values that were being compared were lying 160 bytes apart.

To give the CPU cache an easier time, I decided to store the data as a structure of arrays instead. So now the search is performed on an array of ints, tightly packed, and no longer on an array of structures.

The result of this little exercise? A speed up of 2.7× which pleased me very much. If in the future, more speed ups would be required, I could possibly store the values in a priority queue instead. But for now, this will do nicely.

Note that in my GOAP implementation I search with A* and the most common operations are testing for presence in CLOSED and OPEN sets. These have already been accelerated with hash sets, shifting the bottleneck of the search to the finding of the lowest cost node.