Educational

Basic Garbage Collection Explained

14th September 2020 / Leave a Comment

Imagine you are inside a pub, having a few drinks with your mates. Eventually, as time passes, your table starts to fill up with empty glasses. You have two options to reduce the clutter:

You clean up the table yourself by picking up the glasses and dropping them at the bar
You wait for one of the servers to come by and clean it up for you.

There is a third option where you raise your neck like a meerkat and try to make eye contact with one of the servers, hoping for them to come by your table wherein you order more drinks and in the process they clean up the table for you. We’ll discuss this option later.

Cleaning the glasses by yourself is hard but a sure way to clean things up. You know who’s still drinking and which glasses are clear so you can be precise. However, it is hard work because you have to be constantly mindful of the utilisation and have to keep doing it.

Image result for english pub table drinks

There is a better way.

If the pub you’re at has servers, one of them can come by and clear up the glasses. However, this approach is less precise because while they make sure they only clear empty glasses, sometimes they misjudge and can accidentally clear up ones that have that some drink left. Also the frequency of the cleanup would have to be fine tuned – although experienced servers and staff have this nailed down pretty well. If they clear up too frequently (imagine a server grabbing the empty glass out of your hand as soon as you’re done) it would be rude and would leave you out of a drink, but if they do it too late, glasses will pile up. Either extreme would result in annoyed patrons who will might leave to find another pub with better staff.

Manual GC

When programming using languages that don’t have “managed memory management”, the former is the case. When the program runs, it will create variables and objects that take up the memory space. If the memory space is not “managed”, the programmer has to make sure to put instructions in code to clear up the memory. This could be seen as a boon from performance perspective because the programmer would know when to and when not to clear the memory. Additionally, it gives programmer more control over what is running, i.e. there is no garbage collector process or thread present which could interfere with system’s performance.

At the same time, this could be problematic because they might forget to clear some parts which could eventually fill up the memory over time. Also, going back to the pub analogy, if you’re at the pub having a nice chat with your mates, it can get annoying and time consuming on your part to have to remember to clear up the table from time to time. If you forget and glasses keep piling up to a point when the table can no longer hold more glasses, adding one more glass onto the pile could make them all crash onto the floor, ruining everybody’s evening. The program, in the same way could crash if the programmer either forgets to clear up some objects or references or clears up objects or references that are still in use.

If the memory space is not “managed”, the programmer has to make sure to put instructions in code to clear up the memory

Some languages, like Java for instance, have built in garbage collection. What this means is, instead of the programmer having to clear up the memory manually, there is a small, low priority thread running in the background that watches the lifecycle of the objects and variables that are created, making sure that unused ones get cleared up. This approach is great from programming perspective because as a programmer, I don’t have to micro-manage the memory. It frees me up to do other things like writing elegant code and focusing more on the business logic. However, as we established earlier, this is not very precise because while the garbage collector is watching the objects and taking notes, it takes up a small, tiny amount of CPU power and memory. In the grand scale of things, this is small, however, arguably those resources could be better spent in processing more requests.

Serial GC

Now when it comes to clearing up the memory, the garbage collector can take different approaches. A simplistic approach is to simply freeze the program and clear up all objects and variables that are no longer referenced. While this is ok for small applications, it is not ideal because for that time, the program is unresponsive. Usually with small programs, this time is in micro or even nano seconds so it won’t matter, however with large applications using gigabytes of memory, this time could be a couple of seconds. This is actually how garbage collection algorithm used to work in the old days and you can still use it now using the -XX:+UseSerialGC flag when running your Java application.

If that wasn’t clear, lets go back to the pub analogy. Having Serial GC is like having one server in the entire pub who goes around and clears up the unused glasses off of people’s tables. But instead of doing it silently and courteously, they’ll just climb up on top of table and shout “EVERYBODY FREEZE”. Everybody freezes like a statue making it easy for them to go around and collect the empty glasses. When they’re done, they’ll shout “CONTINUE” and everybody would resume like nothing happened. Now, even if everybody is pretending, this could get annoying if the pub doesn’t have sufficient glasses and have to call “FREEZE!” a lot (read low available memory) or if the pub is really busy and everybody is drinking really fast (read highly variable memory utilisation).

A simplistic approach is to simply freeze the program and clear up all objects and variables that are no longer referenced.

This act of shouting “FREEZE!” and “CONTINUE!”, in technical terms is called a Stop The World (STW) action. If required, JVM will call STW to pause all threads and objects from processing anything, resuming only when garbage collection is done.

CMS GC

Going back to the pub analogy, there is another way to clear up the tables of unused glasses. When the pub is busy, instead of picking up and carrying the glasses at once, the server will watch the tables and mentally mark the glasses that are empty. If the glasses are not touched on say the third mark, they are collected. Simple right? In terms of garbage collection in Java, this approach is called the Concurrent Mark and Sweep (CMS).

A CMS garbage collector has two primary functions. Mark and Sweep. Marking is action where references to an object are marked. When an object no longer has any references, it is “sweeped”. CMS is a concurrent algorithm, taking full advantage of the modern multi core processors.

G1 GC

Like a couple of servers in a small pub, CMS works effectively upto a certain point. When heap sizes increase beyond 4GB, the efficiency that it brings starts to taper off. Relating back to the pub analogy, servers going around the whole pub marking and clearing glasses off the tables works efficiently up to a point. When the pub is too large say it has multiple levels or is large enough to hold say > 2000 people, going around the whole pub becomes tiring for the servers.

This is where the G1 Garbage Collector (G1GC) comes in. The basic premise of it’s algorithm is to divide up the heap space into sections such that each garbage collection thread has a small enough section to manage. Think of this as each server in a pub having its own dedicated area to manage, like it usually is in the real world! Here since each server is operating in their own area, it is not only easier to manage the objects but more importantly it reduces overall overhead.

Conclusion

Garbage collection is pretty interesting topic. There are several built in as well as custom algorithms available. The Java virtual machine allows for a lot of customisation not only in the garbage collection approach but also in tuning of the garbage collection algorithm itself. However, it is a bit of a black art in the sense that there is no silver bullet and each approach is unique to many things like the supporting hardware, application use case, application architecture, underlying system settings and features etc. Having said that though, it is important to note that there is no inherently bad garbage collection algorithm. Its all relative.

For instance, while the default garbage collection algorithm is the concurrent mark sweep, it is a bad choice if you are running single core machine (although I think it switches from cms automatically if the number of CPUs==1) and if your application is sort of a time-driven batch application. In that case, the serial GC is perfectly a fine choice despite the STW freezes. Why? Well because the batch application can do its thing, using reasonable amount of memory and when its done, who cares if the application freezes? Again, use case is king here.

In this post I’ve tried to simplify the concepts around garbage collection in Java to make it easy for everyone to understand. Understand that the choice of an optimal algorithm isn’t as clear cut as it seems and it might take some effort to actually find the right GC algorithm for your application. I’ll cover this some other time, but please do remember that even the best garbage collector algorithms are no replacement for bad code or bad architecture! Choosing a wrong algorithm could wreck the application performance as much as the right algorithm that could enhance it. Usually there are certain markers that should tell you if the garbage collector is working against you and this is the point at which you should start your analysis. Having said that though, the default CMS is perfectly acceptable in most cases and you probably don’t need to spend two sprints analysing your application.

On consenting to cookies

6th September 2020 / 1 Comment

Alright lets talk about cookies a second. Remember the EU Cookie Directive? All those pop ups asking to allow cookies are a direct result of that cookie directive.

State of things

Yes it is a little bit annoying, especially if you’re visiting a website only momentarily to look something up and the pop up covers your entire screen preventing access. To be honest, I didn’t pay much attention to them. In fact, I didn’t even read what the pop up said, instead my mind would seek a way out of it by looking for a highlighted button with words like Accept/Allow/OK/Dismiss just so that I could get to the content that I was trying to view. In all fairness, this was the case because I didn’t visit the site expecting to agree to some contract but to view the content of the website.

As an Engineer, I know what the cookie directive says. I know that I shouldn’t agree to it unless I really do but in most cases, like most normal people would treat terms and conditions, my instinct is to find a way out of it. So lets look at what the cookie directive actually says:

“Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC, inter alia, about the purposes of the processing. This shall not prevent any technical storage or access for the sole purpose of carrying out the transmission of a communication over an electronic communications network, or as strictly necessary in order for the provider of an information society service explicitly requested by the subscriber or user to provide the service.;”
https://cookiepedia.co.uk/eu-cookie-law

So to comply with this, websites started putting up those annoying pop ups so that they can provide “clarity” into whatever use case they have of storing cookies on my computer. How do they do it? Well, usually, these things have two action buttons, Allow All or Reject Cookies. I don’t know about you but it feels like a red pill / blue pill moment to me.

https://i0.wp.com/davidmmasters.com/wp-content/uploads/2017/04/red-pill-of-awareness-or-blue-pill-of-unconsciousness.jpg?w=700 — Blue pill / Red pill

There is however a third option. It usually resembles “Manage settings”, “more options” or in cases when they really don’t want you to click on it, “Learn more”. When you click on this option, the pop up usually changes into something a bit more helpful.

Now that makes sense – this is what BuzzFeed wants to do with my cookies. Note that everything is “Off” by default. This is what is supposed to be the default, but if you remember the previous screen, the primary button, designed to draw your attention and make you click was the Agree and Exit button.

What would happen if you clicked that? Well I checked and in this case, it stored 34 cookies in my browser – in addition to you permitting all the above use cases. Whoa thats a lot of cookies. No wonder my computer is fat! What if you clicked Disagree and Exit button? 4 cookies. Thats it. If you clicked More Options, here again, the primary button is Agree and Exit, seems like they really want to store cookies in your browser! Alas, if you click that, your browser will have 34 cookies again, same result as if you never had clicked on More options in the first place. Although, if you clicked Save and exit, the number of cookies stored are only 4. Here’s a small list to help you with the math.

Agree & Exit: 34 cookies
More Options -> Agree & Exit: 34 cookies
Disagree & Exit: 4 cookies
More Options -> Save & Exit: 4 cookies

Now BuzzFeed is actually one of the good websites so they have a direct option to Disagree & Exit which effectivelly turns off other cookies. Here’s a website called Healthline.com who don’t provide this option:

When you click Manage Settings, you get a nice page with all fancy cookie use cases turned off by default:

Had you agreed to the first pop up and clicked on “Accept and continue to site” you’d have consented healthline.com and its partners to do whatever they want to do under “Special purposes” and “Special features” (+ all the other use cases they have outlined) and received 12 cookies stored in your web browser. On the other hand, if you go into “Manage settings” and click “Save settings”, it’ll take you to anon.healthline.com – which is an ad free version of the website. It still stores 7 cookies but they are not storing any third party or marketing cookies on your web browser.

And of course, this is assuming the websites are actually complying to the cookie law and aren’t doing naughty things anyway. Of course there’s no real way of knowing that. Websites could store cookies in your browser to track you and still send your information to the third party, via a serverside route without you knowing it happened at all.

Reflection

The thing about this that is striking to me is that for most people, the EU cookie directive has become more pain than something that actually works. From a consumer standpoint, it is merely an annoying pop up that gives bad user experience. Users just want to find the quickest way out of it and the website developers are happy to provide that – using the same dopamine driven design psychology that they use to get consumers to “Buy”, “Like” and “Share” things.

As an Engineer, this is annoying on several levels. It interrupts the user experience flow. It breaks immersive experience. It is annoying to implement. And even after all that, it doesn’t give users what they deserve because they accept everything without reading anything.

I’ll explore the potential solutions some other time, but for now, the only real approach is to “take control” by reviewing the cookie consent and proceeding in an informed manner. Whether you’re on a well known recipe website just cheekily reading that korma recipe or sifting through tons of web pages trying to figure out whether aliens really made the pyramids, take a few seconds of your time to review what you are consenting to. Or better yet, if you’re only browsing to check something quickly, use something like Firefox Focus (or incognito/private mode in your favourite web browser), which effectively gives you a disposable browser where when you’re done with your session, you can wipe the slate clean.

If you want to explore the wonderful world of websites and the cookies they work with, you can try https://www.cookieserve.com. Type in a URL and it’ll explain which cookies serve what purpose. Its not bullet proof but is good at catching google and facebook cookies if a website is sneakily using them.

You could also use browser plugins that prevent tracking cookies from being stored. I use uBlock Origin (chrome/firefox) which is a general tracking blocker. Additionally, if you use a modern web browser like Mozilla Firefox, it comes with tracking protection baked in and turned on. Not sure about Google Chrome, it probably doesn’t since Google’s revenue depends on Ads. I’ve heard of an alternative called Chromium – which is supposed to be the core open source version of Google Chrome without the Google stuff but I’m not too sure about it so can’t strongly recommend.

Useful unix file editing commands

27th June 2018 / Leave a Comment

ManthanHD

Findings, thoughts and observations from the eye of a Software Engineer.

Basic Garbage Collection Explained

Manual GC

Serial GC

CMS GC

G1 GC

Conclusion

Like this:

Useful unix file editing commands

Manual GC

Serial GC

CMS GC

G1 GC

Conclusion

Did you find this post useful? Spread the word!

Like this:

State of things

Reflection

Did you find this post useful? Spread the word!

Like this: