Software Dependencies

Dated Nov 14, 2020; last modified on Sun, 06 Feb 2022

Dependency Management

Golang introduced a new library referencing mode to overcome limitations of the old one. While the two library modes are supported by Golang, they are incompatible, e.g. dependency management (DM) issues, reference inconsistencies, build failures, etc. did an empirical study that resulted in HERO, an automated technique to detect DM issues and suggest fixes. Applied to 19k Golang projects, HERO detected 98.5% on a DM issue benchmark, and found 2,422 new DM issues in 2,356 Golang projects. They reported 280 issues, and almost all of the fixes have adopted HERO’s fixing suggestions.

Prior to Golang 1.11, libraries were supported by the GOPATH mode, which fetched the library’s latest version. To overcome this, Golang devs used third-party tools like Dep and Glide. Golang 1.11 introduced Go Modules which is more flexible and allows multiple versions to coexist in a Golang project. Of the Golang projects surveyed, 64.1% were still using GOPATH.

I dig how Golang modules (fetched via go get ) are specified. For instance, golang.org/x/text@v0.3.2 exists at the golang.org/x/text URL. Pretty sweet. The fact that we’re explicitly downloading a package from a URL emphasizes the fact that we’re choosing to trust whoever is in control of the website.

Bloated Dependencies

A bloated dependency is one which is packaged in the application binary, but is not needed to run the application.

There are two levels to this: (1) a source file declares a dependency on foo but never actually uses foo, and (2) the application as a whole never uses foo. An optimal de-bloating solution would first address (1) and then tackle (2).

Some languages may have better tooling than others when it comes to automatically de-bloating their dependencies.

propose DepClean, a tool for de-bloating Java/Maven dependency trees. Of 9,639 Java artifacts, which include a total of 723k dependency relationships, they found 2.7% of directly declared dependencies are bloated, 15.4% of inherited dependencies are bloated, and 57% of transitive dependencies are bloated. In principle, it’s feasible to reduce the total number of dependencies of the studied artifacts to \(1/4\) of its current count.

Languages have their de-facto (or rather most popular) package managers. Table compiled from

Language Popular Package Manager(s)
Java Apache Maven, Apache Ivy
Python Pip, Conda, EasyInstall
Node.js, JavaScript NPM, Yarn
Ruby RubyGems, Bundler
.NET, Xamarin NuGet
C++ CMake
Go Go
Rust Cargo
Lisp Quicklisp
Swift, Objective-C CocoaPods

Java developers using Maven declare their dependencies in a POM file. Given an application and its POM file, collects the dependencies declared in the POM file and their transitive dependencies, analyzes the byte-code of the artifact and all its dependencies to determine the presence of bloated dependencies.

With regard to the two-step de-bloating approach , it seems that misses step (1). Or is that an declared but unused import in a Java source file does not add byte-code in the final artifact? Passing

import java.awt.Image;
import java.math.BigDecimal;

public class HelloWorld {

    public static void main(String[] args) {
        BigDecimal a = new BigDecimal("1.0"); // `a` is unused.
        // Prints "Hello, World" to the terminal window.
        System.out.println("Hello, World");
    }

}

through the Procyon decompiler at Java decompiler online gives back:

import java.math.BigDecimal;

//
// Decompiled by Procyon v0.5.36
//

public class HelloWorld
{
    public static void main(final String[] array) {
        final BigDecimal bigDecimal = new BigDecimal("1.0");
        System.out.println("Hello, World");
    }
}

The unused java.awt.Image seems not to make it to the .class file. So does not lose anything by skipping step 1. Working backwards from the .class file would prune out the unused import statements.

study the evolution and impact of bloated dependencies in the Java/Maven ecosystem. Bloated dependencies steadily increase over time, and 89.2% of the direct dependencies that are bloated remain bloated. 22% of dependency updates performed by developers were made on bloated dependencies.

Dependencies' Vulnerabilities

No matter the obfuscation in the source code, the malicious package will have to make system calls in order to do anything interesting. These system calls are easier to analyze. Furthermore, module recontextualization , a dynamic program analysis technique, can help detect unusual resources being used by an imported package.

Know your dependencies by heart. Know the maintainers. Be aware of problems going on in the project and help, e.g. patches, funding sources, etc.

build a vulnerabilities database (1,180 CVEs and 224 security bugs). With their database, and their in-app third-party library (TPL) detector , they analyze 104k apps, and find 9k apps include vulnerable TPL versions and 7k security bugs.

In-App Third-Party Library Detection

Static detection of third-party libraries is a solved problem when the code is using a dependency manager. However, some of the declared dependencies may be bloated , and there’s active research on de-bloating.

I’m not sure what in-app TPL detection entails, and how useful it is in practice.

Identifying in-app third-party libraries (TPLs) faces challenges like TPL dependency, code obfuscation & dead-code removal, and precise version representation.

propose ATVHunter, a better tool for identifying Android in-app TPLs. They build a TPL database (189k TPLs with 3m versions). To identify specific TPL versions, they extract the Control Flow Graphs (CFG) to match potential TPLs, and then narrow down to the version by comparing opcode sequences in each basic block of CFG. ATVHunter outperforms existing tools, is resilient to common obfuscation techniques, and is scalable for large-scale TPL detection, e.g. vulnerability detection .

do not have access to POM files. But they start by decompiling the android app. I’m not really sure what the constraints that they are working under.

References

  1. Hunting for Malicious Packages on PyPI. Jordan Wright. jordan-wright.com . Nov 12, 2020.
  2. Dependencies and Maintainers. Drew DeVault. drewdevault.com . Feb 6, 2020.
  3. A Longitudinal Analysis of Bloated Java Dependencies. Soto-Valero, César; Thomas Durieux; Benoit Baudry. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th ACM Joint Meeting, Aug 2021, pp. 1021 - 1031. KTH Royal Institute of Technology. doi.org . scholar.google.com . Cited 0 times as of Jan 30, 2022.
  4. A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem. Soto-Valero, César; Harrand, Nicolas; Monperrus, Martin; Baudry, Benoit. Empirical Software Engineering, Vol 26, No. 3. doi.org . scholar.google.com . github.com . 2021. Cited 14 times as of Jan 30, 2022.
  5. List of software package management systems - Wikipedia. en.wikipedia.org . Accessed Jan 22, 2022.
  6. ATVHunter: Reliable Version Detection of Third-Party Libraries for Vulnerability Identification in Android Applications. Zhan, Xian; Fan, Lingling; Chen, Sen; We, Feng; Liu, Tianming; Luo, Xiapu; Liu, Yang. International Conference on Software Engineering, 43rd, 2021. The Hong Kong Polytechnic University; Nankai University; Tianjin University; Nanyang Technological University; Monash University. doi.org . scholar.google.com . Cited 10 times as of Jan 30, 2022.
  7. HERO: On the Chaos When PATH Meets Modules. Wang, Ying; Liang Qiao; Chang Xu; Yepang Liu; Shing-Chi Cheung; Na Meng; Hai Yu; Zhiliang Zhu. International Conference on Software Engineering, 43rd, 2021, pp. 99-111. Northeastern University; Nanjing University; Southern University of Science and Technology; Hong Kong University of Science and Technology; Virginia Tech. doi.org . scholar.google.com . Cited 0 times as of Feb 6, 2022.