This post is more of just a brain dump, which hopefully will read with some kind of correlations. I’ve been bitching about my home dev system running super sluggish since I wiped everything clean and installed Debian 10. Normally, starting with a clean Linux install means things just work, and work well. Like most Linux daily-drivers, I’ll do this every year or so because in the course of development, you make config changes and system updates, and who-knows-what else, and this eventually descimates the stability of your system. You end up with weird quirks here and there because of some change a year ago and you’ve all but forgotten why you made that change. So – you wipe fresh and start over and things are great.
This time they weren’t.
You’re killin’ me, Smalls..
Right away, I noticed things were crawling. Like, Windows searching a network shared drive with thousands of files crawling. Or, for those that remember, viewing big images over 56k dialup slow. I didn’t think anything of it, and figured it was just something to do with Debian 10 and they’d get it worked out at some point with one of the updates. It wasn’t until I was listening to Spotify, and had a handful of windows open with far too many tabs that my system locked up. So I loaded htop in a terminal as soon as I was able and sure as shit. Memory consumption was at 100%, and all my swap space was gone. This, ladies and gentlemen, was a hard deadlock. holds power button for a swift death
One of the details I immediately noticed after I realized that 100% of the RAM was consumed, was the fact that it said only 3.6GB was available. Now, when I bought this machine some seven or so years ago, it originally came with 16GB. Where TF did the other 12GB go? I have no clue, other than assuming some modules have given up the ghost over years of intensive use (I went from a script kiddie to skilled dev in this time, on this machine). A lot has been asked of that RAM, and it deserves all the respects. However – it’s time to upgrade because there are some projects in the pipeline and I can’t just be killing the box in the process. One of the bigger things is I think I’ll be migrating to FreeBSD as my daily driver, but that’s a whole other post in of itself.
What does this have to do with anything, tho??
As I said, this post is a bit of a braindump. A challenge I’m working on at work currently is determing the best course of action to address persistent session storage across some recently upgraded legacy servers. We’ve moved them into a high-availability configuration, and I’ve subsequently implemented some new vulnerability mitigations for CSRF. While the servers are configured to deal with a good portion of that vulnerability class, just leaving it to the servers doesn’t fit the security-in-depth model very well. As such, I’ve added a CSRF token that follows requests all the way through the user’s action cycle.
The problem is, the loads balancer attempts to maintain a 50/50 split between the servers. So, a user hits the entry script, and this interaction occurs on server1. Somewhere along their experience, they have a delay and they end up submitting the final form, but server1 is getting hit by the marjority of that 50/50 split, so the load balancer sends this form submission to server2. Before the CSRF mitigation, this didn’t matter. My predecessors had this real animosity towards using server sessions for some reason (not entirely sure, may have been this exact reason), and everything was OK-ish. I won’t go into details about the countless poor design decisions in this codebase, that’s for another time. Continuing…
Now the CSRF token is stored in the session. server1 and server2 are not aware of each other’s session stores, and as far as I know, sticky sessions is not currently set on the load balancer. Do you see the issue here now? We need to choose one of a couple of options at this point. The easiest would be to just turn on sticky sessions. That way, an experience that starts on server1 stays on server1; starts on server2 stays on server2, etc. Another, and more scalable in my opinion, would be to spin up a small containe with lots of memory, and only running Redis.
While Redis adds additional complexity, and increases costs both in maintenance and infrastructure scale, I feel this will facilitate future scalability, security, and maintainability. This will also allow me to implement a feature to ensure that session collisions don’t occur, provide better persistent storage across the servers, and allow the load balancer to continue doing its thing as it does now – balance the load equally across the backend infrastructure.
The correlation might seem like grasping at straws, but I figured I could fit these thoughts into a single file. Hopefully I’ll be able to get that ram soon, and we figure out this persistent session storage issue. Nothing like being at a standstill on two fronts because of reliance on other people, and the fact no one seems to have time to breathe anymore.