API Misuse

Ensuring that library APIs are correctly used

When developers use Application Programming Interfaces (APIs), they often make mistakes that can lead to bugs, system crashes, or security vulnerabilities. We refer to such mistakes as misuses. One example of a misuse is forgetting to call close() after opening a FileInputStream and writing to it.

We study various types of API misuse.

API Misuse of Data-centric Python Libraries

Data-centric Python libraries, such as pandas, matplotlib etc., often deal with diverse data structures, intricate processing workflows, and a multitude of parameters, which can make them inherently more challenging to use correctly. Detecting problems in the usage of these libraries is challenging, not only due to the dynamic nature of Python but due to the fact that some misuses depend on the data that is being processed. In this line of work, we investigate how API misuse manifests in these data-centric libraries and how we can design successful detection strategies to help developers use them correctly.

General Java API Misuse

We created MUBench, a benchmark of existing Java API misuses against which we can evaluate several misuse-detectors. We systematically compared existing Java API-misuse detectors and identified weaknesses. This allowed us to design a new API misuse detector, MuDetect, that can achieve higher recall and precision. MuDetect allows us to mine API usage rules that involve method calls and preconditions. These usage rules are then used to find misuses in target projects. MuDetect uses a graph representation called an API Usage Graph (AUG) to represent different aspects of a method call such as the parameters that are required by a method, the types of those parameters, the order in which different method calls are invoked, the exceptions thrown by different method calls, objects that are returned by different method calls.

Annotation Misuse in Java

While MuDetect focuses on method calls, there are other categories of APIs misuses as well, such as misuses that involve annotations. We built a human-in-the-loop approach that focuses on producing accurate Java annotation usage rules. For the ease of usability, these usage rules are packaged into a Maven plugin that can be used to catch bugs (similar to SpotBugs). Our tool is a complete pipeline that provides an easy way to mine and validate usage rules, and generate a misuse detector from confirmed rules.

Java Cryptography Misuse

Through analyzing StackOverflow posts, GitHub repositories, and conducting two surveys of a total of 48 application developers, we collect the problems developers face with the current cryptography APIs and their suggestions for improvement. Some of our findings included that developers have problems choosing the correct algorithm to use and also want higher level abstractions such as tasks. To address these issues, we looked closer at the cryptography domain, and realized that there is a wide variety of cryptographic components and algorithms (e.g., ciphers, digests, signatures, etc.) and that each of these components comes with its own variability. For example, a cipher can be symmetric or asymmetric. If it is symmetric, it can operate on blocks or streams. Additionally, there are different modes of operations (e.g., ECB vs CBC) as well as different padding schemes. In order to deal with this huge variability space, we model cryptographic components using concepts from feature modeling. However, such components have many attributes. Additionally, some cryptography solutions may use multiple components at the same time. We, therefore, need additional modeling notations than those offered by basic feature modeling.

CogniCrypt was built on the insights derived from these studies.

Related Resources

Related Publications

2024

  1. ESEM
    An Empirical Study of API Misuses of Data-Centric Libraries
    Akalanka Galappaththi, Sarah Nadi, and Christoph Treude
    In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24), 2024

2023

  1. SecDev
    Securing Your Crypto-API Usage Through Tool Support - A Usability Study
    Stefan Krüger, Michael Reif, Anna-Katharina Wickert, Sarah Nadi, Karim Ali, Eric Bodden, Yasemin Acar, Mira Mezini, and Sascha Fahl
    In IEEE Secure Development Conference (SecDev), 2023

2022

  1. CASCON
    A Human-in-the-loop Approach to Generate Annotation Usage Rules: A Case Study with MicroProfile
    Mansur Gulami, Ajay Kumar Jha, Sarah Nadi, Karim Ali, Emily Jiang, and Yee-Kang Chang
    In Annual International Conference on Computer Science and Software Engineering (CASCON ’22), 2022
  2. ICSME
    Mining Annotation Usage Rules: A Case Study with MicroProfile
    Batyr Nuryyev, Ajay Kumar Jha, Sarah Nadi, Yee-Kang Chang, Emily Jiang, and Vijay Sundaresan
    In Proceedings of the 38th IEEE International Conference on Software Maintenance and Evolution – Industry Track, 2022

2019

  1. MSR
    Investigating Next-Steps in Static API-Misuse Detection
    Sven Amann, Hoan Nguyen, Sarah Nadi, Tien Nguyen, and Mira Mezini
    In Proceedings of the 16th International Conference on Mining Software Repositories (MSR ’19) , 2019

2018

  1. TSE
    A Systematic Evaluation of Static API-Misuse Detectors
    Sven Amann, Hoan A. Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini
    IEEE Transactions on Software Engineering, 2018

2017

  1. ASE
    CogniCrypt: Supporting Developers in using Cryptography
    Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, and Ram Kamath
    In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17) – Tool Demo Track, 2017

2016

  1. ICSE
    "Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs?
    Sarah Nadi, Stefan Krüger, Mira Mezini, and Eric Bodden
    In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16), 2016
  2. VaMoS
    Variability Modeling of Cryptographic Components (Clafer Experience Report)
    Sarah Nadi, and Stefan Krüger
    2016
  3. MSR
    MUBench: A Benchmark for API-Misuse Detectors
    Sven Amann, Sarah Nadi, Hoan A. Nguyen, Tien N. Nguyen, and Mira Mezini
    In Proceedings of the 13th International Conference on Mining Software Repositories – Data Showcase Track (MSR ’16), 2016

2015

  1. ONWARD
    Towards Secure Integration of Cryptographic Software
    Steven Arzt, Sarah Nadi, Karim Ali, Eric Bodden, Sebastian Erdweg, and Mira Mezini
    In Proceedings of the SIGPLAN Symposium on New Ideas in Programming and Reflections on Software at SPLASH (ONWARD ’15), 2015
    (Acceptance Rate: 17/37 = 35%)