How Can We Measure Our Software’s Modularity and Dependencies?
What is modularity, and what metrics are available to measure modularity?
Introduction
In this article, I’ll talk about modularity: What is modularity? Why is it important? And how can we measure modularity?
Modularity consists of dividing a system into separate and independent parts called groups or modules.
- We can divide an application into separated technical layers: business, persistence, UI, and database
- We can divide an application into separated functional layers: user, payment, order, etc.
- We can group related methods inside a class (
Car:
move
,accelerate
,park
,setPreferences
, etc.)
Modularity has a direct impact on reuse and maintainability
A good organization and division of modules increases the clarity and the maintainability.
The relationship between elements inside of the same group is defined according to a common property that can be functional (authentication, payment, delivery, etc.) or technical (business, service, persistence, UI, etc.).
This definition reminds us of a mathematical theory that has already dealt with groups, their elements, their relations, and their operations: set theory.
To explain modularity, I’ll refer to some set principles. And I’ll be doing this not from a microscopic perspective (e.g., a set as a data structure) but from a macroscopic perspective.
Then, because modularity is a mathematical concept, we can measure it: How modular is an architecture? How independent are the modules? How cohesive are the elements inside a module?
Let’s begin!
Modularity: Mathematical Aspect (Set)
A set is a collection of elements that share a common property and gather together according to this common property.
Inside each set, elements are powerfully connected among themselves and relatively weakly connected to elements in other sets.
Other known types of sets:
∅
denotes the empty setZ
denotes the set of integersR
denotes the set of real numbersN
denotes the set of natural numbers
“In mathematics, a Set is a well-defined collection of distinct elements or members. The elements that make up a set can be anything: people, letters of the alphabet, or mathematical objects, such as numbers, points in space, lines or other geometrical shapes, algebraic constants and variables, or other sets.”
— Jain Ahmad via Wikipedia
- Well-defined means that for any element whatsoever, the question “does this element belong to the collection?” has an unequivocal yes or no answer
- Elements being distinct means that no element in a set is counted twice
“In computer science, separation of concerns (SoC) is a design principle for separating a computer program into distinct sections such that each section addresses a separate concern.”—Wikipedia
“The DRY principle is stated as ‘Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.’”— Wikipedia
“The Single Responsibility Principle (SRP) states that each software module should have one and only one reason to change.” — Robert C. Martin
A software module is just a collection of sections or layers grouped together with a certain property in common (a concern). Common properties can be functional (authentication, payment, delivery, etc.) or technical (business, service, persistence, UI, etc.).
In object-oriented programming, a class is a set of member methods that transform member variables.
Sets are structurally independent of one another, but they can be assembled together to form other sets.
For example, we can combine our Clothes
set, Transport
set, and Foods
set into a unique context/set called Human needs
.
- A human needs to wear clothes
- A human needs a means of transport to go work, visit family, travel, etc.
- A human needs food to survive
We can group the different independent modules together into a single global context: the application.
By the way, Human needs
is a set, and Clothes
, Transport
, and Foods
are now subsets.
Subsets keep the same set properties.
We can also define independent subsets inside a subset:
When we have a complex global context, we can divide it into small independent subcontexts.
“A complex system can be managed by dividing it up into smaller pieces and looking at each one separately.”
—Carliss Y. Baldwin and Kim B. Clark in “Design Rules: The Power of Modularity”
Sets or subsets can also have shared elements
We can use/bring a map when we travel using an airplane, boat, train, or car:
The intersection of different subsets is the map subset.
{1, 2} ∩ {2, 3} = {2}.
{1, 2} ∩ {3, 4} = ∅.
Each element must exist only one time inside a set:
“A set is a gathering together into a whole of definite, distinct objects of our perception and of our thought — which are called elements of the set.”
— Georg Cantor, the founder of the set theory
When there’s duplicated or common elements between sets or subsets:
“A new set can also be constructed by determining which members two sets have ‘in common.’”— Wikipedia
If modules or classes share common elements, we create a common shared module or class (DRY: Don’t repeat yourself).
What Can We Retain From Set Theory ?
Set and elements
- Internal set cohesion (elements are strongly grouped together according a common criteria)
- Elements inside a set are cohesive by default
- Each element inside a set exists only one time (DRY and SRP)
- A set is independent from others (loosely external coupling, SoC)
- Decomposition (subsets)
- New set for common elements between sets or subsets (DRY)
Software modularity is a mathematical aspect
- A module is a set of sections or layers
- A class is a set of member methods
Example of Set Theory Applied in Software: Layered Architecture
In a layered architecture, the common property is the technical role. We group files, classes, or code according to their technical roles: presentation, business, persistence, and database.
We have four sets
- Presentation set: {HTML, CSS}
- Business set: {UI logic adapters, interfaces}
- Persistence set = {ORM}
- Database set = {Tables, SQL queries}
Each set can be also divided into subsets.
Example of Set Theory Applied in Software: SOA Architecture
In an SOA architecture, we group files, classes, or code according to their functional roles: book, order, account, user, etc.
How to Measure Modularity
Because modularity is a mathematical concept, we can measure:
- Internal group cohesion
- Degree of interdependence between software modules, or coupling
Cohesion
Definition
“In computer programming, cohesion refers to the degree to which the elements inside a module belong together.”
— Edward Yourdon and Larry L. Constatine in “Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design”
Types
Functional cohesion is when parts of a module are grouped because they all contribute to a single well-defined task of the module.
Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part (e.g., a function which reads data from a file and processes the data).
Measuring cohesion
Elements inside the below set aren’t cohesive: Elements are grouped arbitrarily; the only relationship between the parts is they’ve been grouped together (coincidental cohesion).
When elements inside a set don’t share any common property, the set loses its well-defined characteristic, and we can’t ensure distinct criteria because there isn’t a property that defines belonging within that set.
Elements inside a set must be cohesive; otherwise, we’d have to keep decomposing until we have cohesive subsets.
Class cohesion (LCOM)
Example:
Consider a class C
with three methods: M1
, M2
, and M3
.
Let {Ij}
= set of instance variables used by method Mi
Let {I1}
= {a,b,c,d,e}
and {I2}
= {a,b,e}
and {I3}
= {x,y,z}
{I1} ∩ {I2}
is non-empty{I1} ∩ {I3}
and{I2} ∩ {I3}
are null sets
LCOM = number of null intersections — number of non-empty intersections.
In our case, LCOM = 2-1
= 1
.
General formula:
“P = {(Ii, Ij) | Ii ∩ Ij = φ} and Q = {(Ii, Ij) | Ii ∩ Ij ≠ φ}
Take each pair of methods in the class. If they access disjoint sets of instance variables, increase P by one. If they share at least one variable access, increase Q by one.
LCOM is a count of the number of method pairs whose similarity is zero.
LCOM = 0 indicates a cohesive class.
LCOM > 0 indicates that the class needs or can be split into two or more classes, since its variables belong in disjoint sets.”
Coupling
Definition
“In software engineering, coupling is the degree of interdependence between software modules; a measure of how closely connected two routines or modules are.”— Wikipedia
Types
Low coupling is often a sign of a well-structured computer system and a good design, and when combined with high cohesion, it’s a sign of high readability and maintainability.
Measuring coupling
Efferent coupling (CE
)
CE
measures the total number of classes inside a package that depend upon classes outside of the package.
In the above example, Module A
has outgoing dependencies to three other classes.
The high value of the metric CE>20
indicates instability of a package — change in any of the numerous external classes can cause the need for changes to the package. Preferred values for the metric CE
are in the range of 0 to 20 — higher values cause problems with the care and the development of the code.
Afferent coupling (CA
)
CA
measures the total number of classes outside of a package that depend upon classes within that package.
In the above example, Module A
has two incoming dependencies.
The CA
metric is highly related with portability. Packages with a higher CA
are bad packages because they’re harder to be replaced since they have a lot of other packages that depend upon them. Preferred values for the metric CA
are in the range of 0 to 500.
Instability
This metric measures the instability of packages, where stability is measured by calculating the effort to change a package without impacting other packages within the application.
Instability I = CE / (CE + CA)
“This metric produces results in the range [0, 1]. A value I=0 indicates a maximally stable package that depends upon nothing and I=1 indicates a total Instable package that has no incoming dependencies but depends upon other packages. So instability negatively influences re-usability, maintainability and portability.”
Conclusion
In this article, we’ve discussed what modularity is, why it’s important, and we can measure it.
Cohesion is a metric to measure how good a software design is in terms of the SRP and SoC principles: If elements inside a module or a class aren’t cohesive, this indicates multiresponsibilities and overresponsibilities. In this case, they should be divided.
A piece of code without concern and responsibility separations is like a house without chairs, walls, doors, and windows: Everything is open. Everything changes together (regressions tests must test everything — related and unrelated code).
We lack modularity when we lack cohesion, SRP, and SoC. (without cohesion, SRP, and SoC, there are no borders or groups. Then there are also no modules — there’s only a single big piece).
A class or module without cohesion is like an unordered garage, where we throw everything randomly and quickly. It’ll be hard to find things.
Cohesion measures how strongly related elements inside a group are — that’s why it must be high.
When cohesion is low this indicates that the SRP and SoC principles have been broken.
On the other hand, the impact of the group on its environment or the interaction between the group and its environment must be as weak as possible — only for necessary needs.
Strong coupling with the outside has a strong impact on maintainability, refactoring, and stability. For each small change or evolution on a class or module, we should verify that every link still works.
When efferent coupling (CE), afferent coupling (CA), and instability are high, this gives an idea about weak the design is and how weak each change and refactor will be as well.
I recommend using these metrics during software design and as you move forward in development so you don’t create a big debt.