Documenting project information with Maven

In most projects I’ve worked on, the project information is captured in some sort of README.md file. While this has its strengths, such as ability to write free form text and embed images, it doesn’t quite fit with the project because when the artifact is produced during the build time, the read me is not checked into the artifact repository along with it. So how do I know who the contributors were for that specific artifact version? How do I know where the project was hosted at the time?

Well, today I learned that maven provides ability to capture some of those attributes and some more! I like this because it means that when the artifact gets checked in (along with its pom.xml) the details about the project are captured for that version, frozen in time.

Great thing about these attibutes is that when you run mvn site they are used in the project information html page that maven produces. Technically, then you can upload this to any statically hosted site location, all versioned up and ready to be explored.

Ok so lets begin exploring these attributes. The first three attributes are the basic ones:

For people working in corporations and companies, the following might be useful, especially when open sourcing the project:

Additionally, the licenses tag can be used to describe some licensing information. How many times have you come across a project that has changed the licensing information half way through versions? This is a life saver (at least legally)!

Some source control information is also useful, however, whats more useful is knowing where to go in order to raise issues. In most companies, people use source control to store code but then use an alternative mechanism like Jira or Trello to mange issues. The scm and issueManagement tags are useful here in clarifying such information:

Developer information on projects is very useful. Companies don’t usually like this because, well, developers are supposed to be expendable, however, in my humble opinion, it is useful to list developers who were working on the project at the time. This way the versions can be tracked, at least to the lead working on it at the time. Use it thusly:

For contributors, people who have worked on the project, although not necessarily part of it exclusively, use:

Lastly, it is useful to capture some information about pipeline that was used to create the project. For this, maven has ciManagement tag:

The above is just a small example of what you can do with these attributes, there are many more sub-attributes that you can explore. Hope this is a good introduction and kicks off some ideas of what you can do with this information.

Finding classes in jar/war files

Recently I was out looking for classes that were present in my Web Application Archive (WAR) file. Why? Well, I was out combating Jar Hell. As I am allergic to doing things manually, here’s a small command I constructed to help me find classes within jar files:

The above command prints out class file as well as the name of the jar file that class is present in. If you’re looking just for the path where the classes are present, you can just run:

 

Update all pom versions at once using maven versions

Most of the projects I work on are multi-module projects so updating versions of pom files manually is a bit pain. As always, here’s a command you can run that will update all pom versions in one go:

Make sure all your modules are discoverable. You can do this by enabling all your profiles in case some of your sub-modules are not visible from the main pom.

Source: https://stackoverflow.com/a/5726412

An introduction to Mocks, Spies and Fakes in unit testing

When we’re unit testing, in addition to knowing the boundary conditions, it is important to know what classification our dependencies fall into, in testing terms. These classifications are:

  • Mocks
  • Spies
  • Fakes

Mocks

As the name suggests, a mock is a proxy or placeholder value that can be programmed to respond just like a real object would under any or matched conditions. A mock should be used where the real object dependency we have is very complicated to deal with during testing. This could be (but is not limited to):

  • Third party interactions (database/queue/api/files)
  • Classes that do really complicated stuff that take a lot of time
  • Things that work with random
  • Time based work

Mocking dependencies is a three phased approach:

  1. Create the mock of the dependency based on the class or interface.
  2. “Arm the mock” by preparing it work request/responses based on any or matched interactions
  3. Set the mock on the test subject

Creating the mock could be the easiest or the hardest thing of all of the three because it depends on what you’re mocking and how you’re mocking it. If the test subject is relying on a class directly rather than an interface, mocking could become a bit trickier depending on how that class was implemented. This is because when you’re mocking a class, it has to first create a new instance of that class (by calling its constructor and by extension constructing all its ancestors). If at any one point, if one of the ancestors is doing something funky with one of its dependencies within the constructor, you’ll have to mock that out. This requires some level of knowledge or understanding of the entire class hierarchy and this is why use of interfaces for your dependencies is preferred.

When mocking an interface, Mockito uses JDK Proxy (as opposed to CGLib when mocking a class) which plays much more nicely with everything else. Also, because interfaces cannot be instantiated, there is no hierarchy to follow and the JDK Proxy wraps around it without any problems (or worry about dependencies). Although note that I have yet to personally test how this works with Java 9 which allows interfaces to have implementation code in them.

Arming the mock is relatively straight forward and involves three steps:

  • Knowing which part of class to “arm” Which method are you mocking?
  • Knowing what to return (or do) What should that method return/do?
  • Knowing what to match Under which parameters should this mock work?

In most mocking frameworks, you can mock methods and have them either call the real thing (if its a spy; covered in next point) or have them return a custom value. In addition to this, you can specify the circumstances under which the mock should do what you’ve configured it to do. This is where the matchers come in. I am not going to cover how the matchers work in this post as that is a whole thing of its own, you can find out more here:

https://static.javadoc.io/org.mockito/mockito-all/1.9.0/org/mockito/Matchers.html

Once you’ve armed the mock, it is time to make it available to the test subject. Now, ideally, this should be done in a way that the test subject isn’t even aware of the mocking, in a most non-intrusive way. Here are a couple of ways you can do this in decreasing order of intrusion:

  • Setter methods with default access modifier
  • Constructor args
  • Inversion of control – field level dependency Injection magic

Obviously its great to aim for point three but sometimes its harder to get DI involved in a project that has never has it so my suggestion here would be to work backwards from that.

When testing, while mocks are great at controlling the behaviours of the dependencies that your implementation is relying on, it is also great to verify whether or not the dependency is used in certain flows. In mockito for example, here you can ask Mockito to verify whether a particular mocked method was called, how many times it was called and even checking in certain cases to ensure that it wasn’t called!

Spies

Spies are like mocks, the only difference being that there is a real object under there. This is not the “real” object that CGLib wraps around, it is the one that you create to then set a spy on. Generally, spies are used in cases where you want to track the interactions that your test subject is making. This is nasty in my opinion and should only be used where you have to mock one of the test subject’s own methods (because they are being called from one of its other methods) or when you want to verify that a certain method is called.

Why is it nasty? Well it is because technically after setting a spy on the test subject object, it is no longer the real test subject. In addition to that spying on the test subject makes it easy to accidentally leave a mock on one of the methods of the test subject’s spy without realising and have the test(s) accidentally pass!

Spying generally involves three steps:

  1. Create real object
  2. Create Spy around the real object
  3. Set (or use) the spy

Fakes

Faking is not mocking. Fakes are real objects but they have fake values in them. My suggestion here is to fake POJOs and other “simple” classes. The general rule of thumb that I think everybody should follow is that if the external dependency is more work to mock than fake then fake. Faking should extend, but not be limited to:

  • Data structures (Map, Queue, List)
  • POJOs
  • Builder classes

Like mocks, using a fake involves two simple steps:

  1. Create the fake with required values
  2. Set the fake

As always, be mindful of the inheritance hierarchy here to ensure it doesn’t become unreliable and make our lives harder than it should be. An unreliable fake is a marker for something that should really be mocked.

Of course, all the three classifications above can be used in conjunction with each other. For example, recently I had to spy on the test subject, mock one of its methods to return a mock object which then in turn returns a fake object when one of its methods is called!

Finding biggest files and folders on Linux/Unix systems

I was happily doing some builds on my Jenkins slave server at home and then suddenly boom! it broke. It took all the queued up builds with it because they all started failing. When I looked, it was a disk space issue.

Normally my builds don’t take up much disk space so I started investigating. I needed to find files that were occupying largest size on the disk. After some trial and error, following command came to rescue:

This was great, but then I wanted to find folders that were the biggest. No problem!

Notice the -type d above which differentiates it from the first command. Also note that the above command outputting the disk use by folder also includes subfolders, so generally the largest ones will be the top level folders and as you go up the list (its ascending in order by default) you’ll find the subfolders with their sizes. You can always pipe the whole thing through grep to only look for the folders you want like so:

While this is great and all, sometimes you just want to go upto certain depth within the file tree. Say no more!

Notice the -maxdepth 2 flag which sets the max depth to 2.

However, you could be one of those people who don’t like the find command at all. Maybe a past feud, or just a dislike. Well, the du command has your back!

The -m flag makes it print file sizes in megabytes, -d 2 sets max depth to 2 and --all tells it to work with files as well as directories. Its actually quite comprehensive because using some clever flags like -I to provide a mask for files and directories to ignore and -L to follow symbolic links (they are not followed by default) you can get quite a lot out of it. Also, a quick note before ending this post, you can switch the file size block from -m for megabytes (example above) to -g for gigabytes or even -k for kilobytes. You can use -h for human readable where it will automatically choose the closest block size but this will confuse the sort because it doesn’t quite take the size character in account and only sorts things using the numeric values.

Above commands have been tested on mac where they were installed as part of GNU CoreUtils homebrew package.