Monday, 10 November 2025

From Monolithic Chaos to Microservice Over-Engineering

There’s a lot of talk these days about software architecture, especially around

First, let's have it as a simple definition (while it will not be accurate for now) Monolithic system mainly refers to a system with a single component for everything, while micro-services refer to many systems working together. let's deep dive more 

What is a Monolithic System?

As a very rough starting point, a monolithic system refers to an application where all business logic lives in one component. Everything — authentication, payments, notifications — is tightly packed together in a single deployable unit.

This is how most developers start, especially in college or during early projects. It’s simple to build, quick to launch, and adding new features is easy (at the beginning).

But as systems grow, monoliths tend to become painful to maintain, extend, or deploy.
A small change can lead to huge side effects. For example, updating a package for the login module might break something in twenty unrelated places because everything is connected.

So let's mention the common problems that appear on monolithic systems:

  1. Security 
    •  All codes have full access to all databases, infrastructure and so on... 
    • Why can order logic access the auth DB?
    • Why can notification access KYC data?
    • Why does Auth have permission to send notifications to users? 
  2. Testing
    • When there is a single project for everything, the test coverage, the types of tests, and other things will be misleading; you need (must) very high coverage for a part of the code, and you are ok with fewer tests for the other part.
    • Different parts of the system require different testing strategies, like UI visual tests for front-end components or profanity filters for communication channels, but in a monolith, all tests mix together. 
  3. Code change/ Deployment frequency - change control 
    • There are many cases where there are parts of the code that live for years with no changes and other parts that have like daily or weekly changes. For example, I faced a case where the payment service has around 0 changes for months, while on the same system, there are around 2-3 changes per day for the ordering service 
    • For cases like this, nothing has changed for most code, but commit history has a lot to say; the team will need to run all kinds of tests on things that never changed, and will need to deploy everything for a change that doesn't represent 0.001% of the code base.
    • Also, it makes no sense to test many lines of code out of the scope just because you upgrade a common package or change something in the common layer (like middleware). 
  4. The separation and abstraction between different Models, services, and Domains
    • When everything is on the same project, it's easy to abuse the abstractions between components. For example, a DB layer for Auth data should only be connected with the Auth service layer. It's so easy to just use it in another service and spread the DB logic into all components.
    • In most cases, there are different teams/squads/engineers assigned to each area, having everything close to each other make controlling the change and the changes scopes is hard and needs communication even for the most simple things, things like migration from a tool/sdk to another is hard because the one who wants this needs to convince other people to migrate to it while they don't think they actually needs it, it will result in many talks and in some cases you will find many convention to the same thing in the same project because some people need a change and the other doesn't agree on it
    • Some Tickets became very hard to handle, there is an issue in the system but you don't know which team should handle it, you need to set many rules to direct to the right person and there are cases when you never know (like DB timeout), the investigator will need to know the recent change history for the whole system and any business changes (like onboard new customer with high traffic ), it will be common here to see fights between people about deployment plans and who should handle what  
  5. Scaling issues
    • As the monolith grows, keeping consistent coding standards gets harder. 
    • New developers struggle to onboard, and scaling the infrastructure becomes increasingly inefficient. 

Now, Let’s Talk Microservices

Microservices break the big system into multiple smaller ones, each handling a specific responsibility and communicating with others.
This architecture solves many monolithic issues but brings its own challenges.

  1.  Communication Issues 
    • Unlike local in-process calls, microservices rely on network communication
      • Higher latency
      • Data transfer limits
      • The need to scale individual services during traffic spikes
      • Handling failures and retries - Imagine the request fails for any reason, mainly if there is a cross-system ACID transaction, you need to have a full overview of what happened. There are many patterns invented to just handle these cases, like inbox/outbox patterns, callbacks vs exceptional query patterns, SAGA pattern
      • Availability across the system - if a service is down, what will happen to the system? 
    • Synchronous and Asynchronous design decisions will be more critical. If someone designed a sync process in the system that can be async, it would cost a lot of money, some services will have to upscale and downscale all the time when it actually don't need it, services wait for other services' responses while it doesn't need 
  2.  Cross-development features and local debugging 
    • In a monolith, one pull request can change everything. In a microservices world, you might need multiple PRs across different repositories.
    • Debugging means running several services on different ports — or connecting your local setup to staging environments just to test one feature.
  3.  Over flexibility 
    • You can use different things for each system. This gives the developer the power to pick the most suitable tools/framework/language for each service, while some think this is a great advantage, it can be a critical issue.
    • The system will need more and more knowledge to navigate through it; you can't onboard someone from one part to another easily.
    • The limitations of the whole system will be unknown, as each subsystem has its own limitations and issues 
    • Reusing code in forms of SDKs/packages became much harder as each subsystem has its own stack, which will need an adapter layer (and in many cases,  re-implementation is much easier for the short-term run)
  4. Larger initial cost (developing and hosting) compared to monolithic
  5. Centralize the public interface
    •  Users shouldn't be aware of the subsystems.
    • A bad example that I faced while integrating with a company is that there is no response standards (some services will return a field called "{error :{code: , Msg: }}" while others return "{err_code: , err_msg}", The same error code means different things based on which service it returns, and so on 
  6. Testing Full flow (e2e tests) will be harder. 
  7. Defining the subsystems' scopes is very tricky 
    • There are some known standards for some parts, like auth, payment, and communication, but the other parts will be based on business vision and context.
    • There are many ways to describe how to define a service, like SOLID and DDD, but there is much flexibility based on the business.
    • The 2 things to balance are 
      • having loosely coupled systems where there are no strong relations between 2 or more components (else it will be called nano-service, which has many issues)
      • Single responsibility where the system only handles a single Domain concept (else it will be a hybrid between monolithic and micro services, which some people call  SOA -Service-Oriented Architecture-).

 The Truth: Monolithic Isn’t Evil and Microservices Aren’t Magic

A common misconception is that one is good and the other is bad. I’ve seen terrible monoliths and terrible microservice systems alike. Re-architecting won’t fix deeper process or design problems.

If your team ignores good engineering principles — SOLID, DRY, KISS, YAGNI, proper layering — no architecture will save you.
Even if you rebuild the system a hundred times in a new architecture, the same issues will come back.

Monolithic architectures are great for quick launches or early-stage startups. They let you move fast, validate your product, and keep things simple.
If you apply modular monolithic principles, clear boundaries, isolated domains, and limited shared dependencies, you can avoid most monolithic pain points. It also makes migration to microservices much easier when your system actually needs it.

While I agree Monolithic gives the developer the power to abuse boundaries between Domains, applying Modular Monolithic fixes most of the separation issues, so back again to the process issues 

Finally, and the thing I really want people to understand, it's not always going to be "Monolithic vs Microservices", you don't have to pick a single option, you can build a part of the system using monolithic and the other using Microservice to take the advantages of both. if you applied the modular monolithic, it will become very easy to migrate later to Microservice when the system actually needs to.

Friday, 24 October 2025

Programming Principles: Short readable function -> better Debuging

The longer you spend writing or reading code, the more you’ll realize how true this is.
A long function isn’t readable — even if you use perfect variable names and have a clear logic flow.
A long class isn’t readable either — even if you apply every best practice you know.

Actually, the longer the code gets, the harder it becomes to apply any best practice in the first place.

Easy to apply and help to apply the other principles

Keeping your functions and classes short doesn’t just make your code look clean — it also helps you apply other important programming principles without even trying, like:

  • DRY (Don’t Repeat Yourself)
    When your functions are small and focused, you naturally start noticing repeated logic.
    Instead of copy-pasting code, you’ll think, “wait, I already wrote something that does this.”
    Refactoring becomes easier because your building blocks are small and reusable.
    In long messy functions, repetition hides easily — you could be doing the same thing three times in different places without realizing it.

  • Single Responsibility
    This one fits perfectly with short functions.
    A function that’s short usually does one job.
    When it starts doing two or three things, you’ll feel it — the size grows, the naming gets harder, and debugging becomes confusing.
    If your function name sounds like a sentence (“fetchAndProcessAndSaveData”), that’s your sign to split it.

  • Code for Humans First, Machines Second
    The compiler doesn’t care how your code looks — but humans do.
    Machines will run a 200-line function just fine, but humans will suffer trying to read it.
    Writing short, clear functions means you respect the next person who’ll read your code — and most of the time, that next person will be you in a few months.
    Code should tell a story, and short functions keep the story easy to follow.

I like this rule because it’s simple and easy to catch.
Even a beginner programmer can look at your code and instantly tell you, “This function is too long.”
Compare that to something like the Open/Closed Principle, where you have to read the whole class and think about multiple edge cases just to spot the issue.

How short should code be?

There’s no strict rule. It depends on the language, but generally, functions between 10 to 20 lines are a good range — maybe closer to 20 for front-end code.

Each line should do one clear thing.
For example, in .NET, it’s tempting to chain multiple actions in one line, like this:

var result = this.data.GetAll().Where(x => x > en.age).SortBy(x => x.CreationDate).Limit(10);

This isn’t a “line” in the readable sense — it’s a wall of logic.
Instead, it should be written like this:

var result = this.data.GetAll() .Where(x => x > en.age) .SortBy(x => x.CreationDate) .Limit(10);

Now each part is easy to scan, easy to debug, and clear in intention.
You instantly know what happens in each step without mental parsing.

Another small but useful thing — limit each line’s maximum number of characters.
Once you start scrolling horizontally, readability is gone.

Why this matters for debugging

When your code is short and focused, debugging becomes faster.
You can isolate issues easily because your functions do one thing — and only one thing.
You don’t have to jump between a giant function trying to trace where something went wrong.

In real projects, this rule saves time, reduces bugs, and helps other developers (and your future self) understand what’s going on.
Long code might “work,” but short code lives longer.

Monday, 1 September 2025

Technical Design Document

In software development, we often move fast—especially in agile environments. But moving fast doesn’t mean skipping structure. One of the most important tools to ensure alignment, clarity, and quality is the Technical Design Document (TDD)

It’s not just for architects or senior engineers; it’s something every developer should know how to write. A well-written design document saves time, reduces misunderstandings, and ensures the team is building the right solution for the right business problem.

Why a Technical Design Document?

Whenever we need to introduce a new system component into an existing system or change the current architecture, it must be done in a way that aligns with company policies. 

In a waterfall setup, this document is usually written only by an architect with many years of experience. But in an agile environment, it’s often created by any member of the team. That means even someone with just a year of experience should know how to write it.

What Needs to Be Included in the TDD

I believe the TDD should have 3 sections, and each one should be finalized and approved before moving on to the next.

1. Introduction Section

The programmer needs to define the scope of the problem and explain why it’s an issue—mainly from the business perspective, not the technical one. 

If the problem has no business impact, why solve it? Some problems may not seem to have a direct business effect (like poor database design or spaghetti code), but in reality they do—causing timeouts, bugs, and delays in adding new features. This section should determine whether the reader continues with the document or not. 

Highlight the business requirements and translate them into a checklist that any solution must satisfy to be acceptable. I recommend finalizing this section before diving deeper, since changes here can affect the entire document.

Good examples: lack of systemic communication with users, poor retention rate, 1 out of every 3 orders failing permanently due to database errors, or order processing time being 12 minutes slower than the market average.
Bad examples: bad DB schema with many nulls, code being hard to onboard or edit, frequent deployment failures.

2. Solution Discussion

A design document should propose a single solution. There may be many possible approaches, but you need to compare them, pick the best one, and justify your choice. Once you’ve selected a solution, dive deeper and describe possible sub-solutions.

For example, if your solution is to design a microservice, then sub-solutions could cover which database to use, how the service communicates with others, and what the abstract interface looks like. 

Another example is migrating a service—sub-solutions here might include how to handle outage time, and how to sync old data with the new system.

This section should also include the high-level flows. Finally, explain how this solution fully addresses the problem, ensuring that every item in the checklist is covered.

3. Timeline and Components Discussion

After agreeing on the solution, it’s important to write a section that breaks the work into features, assigning each with estimated time, priority, and whether it depends on another phase. 

A common mistake here is dividing the work vertically (layer-based) instead of horizontally (feature-based). Remember, in agile we want to deliver as soon as possible. If you build the entire database layer, then the business layer, but don’t deliver a single usable UI or API, you’ll only get feedback very late in the process—and that will backfire.

This section also gives stakeholders the space to decide whether this solution should be developed right now or if other priorities come first—meaning it might be delayed.

Final Thoughts

At the end of the day, a technical design document is not just paperwork—it’s a communication tool. It aligns engineers, product managers, and stakeholders on why a problem matters, how we plan to solve it, and what it will take to get there. Writing a good one takes practice, but it’s a skill every engineer should develop early. It’s one of the best ways to ensure we’re building solutions that are not only technically sound but also meaningful for the business.

Another big benefit is onboarding: when each part of your system has a document describing what it is, why it exists, and how it works, new team members can ramp up much faster.

Thursday, 10 July 2025

Interviewing in the Age of AI

One thought I’ve been wanting to talk about for a while is how hiring should change now that AI is changing everything around us—including how people apply and how companies hire.

A Quick Look Back

A few years back, things were simpler. There was a quick screening call, maybe an onsite interview, and that was enough to decide if the person is a good fit or not. Then companies started using online tools to make the process easier: assessments, take-home tasks, Zoom interviews, and systems to auto-filter CVs (ATS). It made scheduling and logistics easier, and although there were some concerns about cheating, it was still manageable. Tools could catch it, and honestly, cheating wasn’t as common or as easy as it is now.

The AI-Driven Landscape

Today, we need to look again at every step in this process. Most CVs now are optimized using AI just to pass the ATS and make the candidate seem like a perfect fit—even if they’re not. The tasks that used to take a day or two can be done in a few minutes. The online assessments can be solved instantly with AI. All of this makes the modern hiring process harder than ever. Personally, I’ve seen cases where I was sure the person was cheating, and suddenly I’m in a full Detective Conan episode trying to prove it.

At the same time, HR teams and hiring managers are depending more and more on AI because it saves time. And it does help in some ways—it can give feedback on a candidate’s fit, suggest improvements, and even guide people on what they should learn. But the downsides are becoming a real issue.

The Paradox of AI in Hiring

For example, AI-generated CVs often include exaggerated or fake qualifications. I’m pretty sure many people just generate them, don’t even read the final version, and then submit directly. That makes ranking or evaluating real candidates harder than it should be. On top of that, the new online assessment tools track behavior and movement to detect AI usage or cheating, but sometimes they just make people more anxious. I remember before, I used to pause for a couple of minutes just to think about the best way to approach a question. Now, some systems flag that as suspicious behavior—like maybe I’m using a second device. It’s frustrating.

Online interviews aren’t better. Instead of focusing on the actual content, I often find myself trying to figure out if someone is using AI behind the scenes. It’s no longer just a technical interview—it’s like a game of mental chess.

As for take-home tasks, if you say it’s okay to use AI, then fine—it’s fair for everyone. But if you clearly say “don’t use AI,” then you’re basically punishing the honest people. The ones who play fair (and these are usually the kind of people you want to hire) end up at a disadvantage. So ironically, you end up filtering out the candidates with integrity.

Are We Hiring the Right People?

I don’t have exact data on how much AI affects hiring results, but I liked this post that shows how weird the current state is for both candidates and companies:
https://www.linkedin.com/posts/nasserjr_recruitment-process-activity-7267260107128778753-_mnF

And another meme from the same person that really hits home:
https://www.linkedin.com/posts/nasserjr_well-thanks-activity-7292851244702834689-QW0M

The real problem is that while AI saves the company time, we need to ask: Are we actually hiring the right people? And even if we find a great match, what’s the guarantee they’ll even accept the offer?

Final Thoughts

I’m not against using AI in hiring, but we need to adapt. We can’t keep applying old-school interview methods in a world where AI is involved on both sides. The system needs to evolve, or we’ll keep making the wrong calls, for both candidates and companies.

Friday, 4 July 2025

Off-Topic in Software Engineering — Does It Really Not Matter?

Although I’ll focus mainly on software engineers in this post, what I’m saying applies just as much to data scientists, data engineers, QA, DevOps, and honestly anyone working in tech.

While talking with fresh or junior engineers, I’ve noticed something: many topics I encountered during my learning journey are now often seen as “unnecessary for work.” Things like how the internet works, how data flows through systems, or how computers handle memory — they’re often brushed off as irrelevant or “too low-level.”

Perhaps you don’t need them in your daily tasks for now. But that doesn't mean they won’t be useful later.

For example, I’ve had conversations where I mention (in very abstract terms) how browsers work, only to realize the other person doesn't know how data even travels over the internet. Or when I bring up data storage or database internals and hear something like: "I’m a full-stack developer" or "I'm doing ML models — this is outside my scope."

But the truth is: understanding these things — even at a high level — makes you better at what you do. Whether you're debugging, optimizing, scaling, or building something new, having that foundational context helps.

You don’t need to dive deep into every topic. But knowing just enough about what's behind the scenes helps you make smarter decisions. A QA engineer who understands backend behavior can write better test strategies. A data scientist who knows how pipelines are built can spot issues earlier. A software engineer who understands how memory works will write more efficient code.

Think of it like this: doctors study the entire human body before they specialize, not because they’ll use all of it every day, but because it helps them see the full picture. The same goes here.

So if you're early in your career — don’t dismiss things as “off-topic” too quickly. The things you skip today might be the exact things you need a few years down the line. Knowledge compounds — and it always pays off.

Sunday, 22 June 2025

Why Students Should Think Twice Before Overusing AI Tools in College

In recent years, I’ve noticed a growing trend: many students and fresh graduates are heavily relying on AI tools during their college years. While I’m a strong believer in the power of large language models (LLMs) — for code generation, documentation, testing, deployment, infrastructure support, and more — I want to explain why you should not become overly dependent on them during your learning journey.

1. College Is for Learning, Not Just Finishing Tasks

Most college assignments and projects have been done countless times before. So why do professors still ask you to do them?

Because these exercises are not about the final output — they’re about the thinking process. They’re designed to help you build a deep understanding of computer science fundamentals. When you shortcut that process by asking an AI to do the thinking for you, you miss the real purpose: learning how to solve problems yourself.

There are public repositories where you can copy solutions and make your projects run instantly. But that’s not the point — your goal in college is not to finish, it’s to understand.

2. If AI Can Do Your Job, Why Would a Company Hire You?

If your only skill is knowing how to prompt AI tools, you’re making yourself easy to replace.

I’ve seen many people ace online assessments — solving problems involving dynamic programming, binary search, graph theory, and more — only to struggle with the basics during on-site interviews. They couldn’t analyze the complexity of a simple nested loop or explain how to choose between two sorting algorithms.

Overusing AI creates a false sense of competence. If you constantly rely on it to get things done, what happens when you face a challenge in real life — one that requires your own reasoning?

3. LLMs Aren’t Always Reliable for Complex or In-Depth Work

Despite all the hype, AI tools are not always accurate.

LLMs can give different answers to the same question depending on how it’s phrased. They sometimes produce code with compile errors or hallucinate incorrect explanations. Unless you understand the underlying concept, you won’t be able to judge whether the AI’s response is correct — and that’s risky.

AI should assist your thinking, not replace it.

4. Don’t Treat Private Code Like It’s Public

A major concern when using public AI tools is data leakage. Once you paste your code, tasks, or documentation into an online AI model, you have no real control over where that information ends up. Future users asking similar questions might get your proprietary logic as part of their output.

I saw this firsthand with an intern we were onboarding. After being assigned a task (with no pressure or deadline), he immediately started pasting a large portion of our internal code and task descriptions into GPT. He took the AI’s response, submitted it as a pull request — and didn’t even test it.

When I asked him about a specific line in the code, he had no idea what it did. I told him clearly: do not upload internal code, models, documents — anything — to GPT. If you need help or more time, just ask. You’re here to learn, not to impress us with how fast you can finish something.

Unfortunately, he kept doing the same thing. Eventually, our manager had to send out a formal email reminding everyone not to share internal content with public AI tools. Whether it was because of this intern or others, the message was clear: this isn’t acceptable. Yet he still relied on GPT for everything, and we all agreed — he had become someone who couldn’t write a line of code without help.


Final Thoughts

AI is a powerful tool — no doubt. But if you rely on it too early and too heavily, especially during your formative learning years, you’re sabotaging your own growth. Use it to assist you, not to bypass the learning process. Learn the foundations first. Think independently. Struggle, fail, and get better.

You’ll thank yourself later — when you're the one solving real problems, not just prompting AI to do it for you.

For example: this post was mainly written by me. I used AI to review it, then I reviewed the AI’s suggestions and made further improvements. That’s how you should be using these tools — not as a crutch, but as a sounding board to help you grow.

Sunday, 1 June 2025

Why Alarms Feel Broken (and How to Fix Them)

I love talking about common myths in software engineering, and here’s the first one: alarms.

The purpose of alarms is simple — visibility without manual checks. Instead of fetching data, the system pushes alerts when something's wrong. It sounds great, right? So why do alarms often feel like a nightmare?

Let’s break it down.

The Manager's View vs The On-Call Engineer's Reality

From a management perspective, more alarms = more safety. They want visibility over every metric to avoid any incident slipping through the cracks. If two metrics signal the same issue, they often prefer two separate alarms — just to be extra safe.

But from the on-call engineer’s perspective, this turns into chaos. Alarms with no clear action, duplicated alerts for the same issue, and false positives just create noise. Nobody wants to be woken up at 3 AM for something that doesn’t need immediate attention.

The core problem? Neither side feels the pain of the other.

  • Higher-level managers may not have been on-call in 10–20 years — or ever. A dozen P0 alerts a day? Not their problem.

  • Junior engineers on-call may not grasp the full system overview. If it doesn't trigger an alarm, they assume it's fine — which isn’t always true.

So, How Do We Fix It?

Balancing these two viewpoints is the responsibility of senior engineers and mid-level managers. They’re the bridge between hands-on pain and high-level priorities.

Let’s be real: execs won’t care about reducing alarm noise unless it affects a KPI. So change has to start lower down.

Tips to Improve Your Alarm System

  1. Define Clear Priority Levels

    If everything is a P0, your system isn't production-ready. Aim for at least three levels:

    • Level 0 (P0): Needs immediate action (e.g., business-critical outage).

    • Level 1 (P1): Important but can wait a few hours.

    • Level 2 (P2): Can wait days without impact.

    Within each level, use FIFO. If someone asks you to drop a P1 to work on a "more important" P1, your priorities are misaligned; either the more important one should be a P0 or the other be P2.

  2. Align Alarms with Business Impact

    A true P0 should reflect measurable business loss, like a bug letting users use services for free.

    A crash affecting 10 users out of 30 million? That’s a P2. It’s annoying, sure, but it’s not urgent.

  3. Set Realistic Expectations for Each Priority Level

    Use volume thresholds per environment:

    • Prod: Max 1 P0/week, 1 P1/day. The rest should be P2+.

    • This helps you track the system’s health over time.

      If your system doesn't match the production schema, it's a broken system that needs to have full capacity to fix it, not to bring more features (and bugs) to it.

  4. Treat Long Fixes as Tasks, Not Alerts

    If a "bug fix" takes the entire on-call week, it's not a bug — it's a feature request or tech debt task. Don’t let it sit in your incident queue.

The goal is to build a system where alarms are actionable, meaningful, and matched to business priorities — not just noise that trains people to ignore real problems.

Having a better alarm system will boost the on-call experience, bring tech debt under control, and make fixing incidents way faster.

From Monolithic Chaos to Microservice Over-Engineering

There’s a lot of talk these days about software architecture, especially around First, let's have it as a simple definition (while it wi...