Saturday, 5 October 2024

Database Decisions: Choosing Between Relational, Document, and Graph Models for Your System

Choosing the right database is one of the most critical decisions in system architecture. Whether you're dealing with structured or unstructured data, normalized or denormalized data, the choice you make will affect your system's scalability, performance, and maintainability.

This article aims to guide you through the differences between relational, document, and graph databases—highlighting when each type is most suitable, the challenges of using them together in a single system, and key factors to consider for making the best decision for your use case. We'll also explore whether it's feasible for a system to incorporate multiple database types and discuss potential pitfalls of such an approach. By the end, you'll be better equipped to select a database strategy that aligns with your business needs and technical requirements.

Choosing Between Relational and Document Databases

When choosing the right database for your system, it's important to first understand your business needs and use cases. Know the access patterns, the type of data you're storing, and how the business plans to utilize that data.

A common but overly simplistic guideline is: if you have structured data, use a relational database; if it's unstructured, use a document database. However, this approach is misleading. In reality, unstructured data can be stored in a relational database, and structured data can also be efficiently stored in a document database. The choice is less about structure and more about how the data is used and how relationships between data are managed.

Here are some key questions to help guide your decision:

  • Do you need to query on multiple fields frequently?
  • Do you often need to access full records in a single query?
  • What kinds of relationships exist between different records or tables?
  • Does your business require frequent locks and transactions?
  • Does your data have a natural grouping, or does it vary significantly from record to record?
  • How complex are the relationships between your data points?

From my experience, there's a general rule: don't use a relational database without a strong reason. Relational databases provide a lot of power, including support for locks, transactions, relationships, and constraints in a native way. While some document databases offer these features, they often come with trade-offs, like added complexity or performance penalties.

On the other hand, choosing a document database without fully understanding your access patterns could lead to challenges like:

  • Frequent Full Table Scans: Without appropriate understanding of query patterns, you may end up scanning entire collections frequently, increasing costs.
  • Data Consistency Issues: Ensuring data consistency, like unique constraints across collections, can be complex in a document database.
  • Data Duplication: To support access patterns, you might end up duplicating data across collections, leading to the headache of keeping that data in sync.

Understanding Graph Databases

Graph databases can be thought of as a specialized type of document database, but with a focus on modeling relationships. They were created to solve performance issues related to complex relationships in relational databases by storing data as a network of entities and relationships. This type of structure allows graph databases to efficiently handle use cases with a lot of interconnected data.

A graph database uses graph theory to model and perform operations on data relationships, making it an excellent choice for scenarios where relationships are central to the data model. Some natural use cases include:

  • Social Networks: Representing people and the relationships between them.
  • Fraud Detection: Identifying suspicious patterns based on connected entities.
  • Network Management: Modeling and analyzing computer networks.

While I haven’t used graph databases in practice—my knowledge is mostly theoretical—it's clear that they can significantly improve performance when dealing with complex and numerous relationships.

Can a System Use Multiple Types of Databases?

Using two different types of databases in the same system comes with several challenges.

In a microservices architecture, it is sometimes argued that if a service requires multiple databases, it could be split into two separate services, each with its own database. This kind of approach aligns with the single responsibility principle and allows each service to scale independently, using the best database for its specific needs.

However, in a monolithic system, using multiple databases can introduce complications:

  • It gives developers too much flexibility, pushing design decisions into implementation. This means developers will have to constantly make choices like "Which database should I use for this case?"—a decision that should ideally be made during design, not development.
  • It reduces isolation between the business layer and the database layer, since the business logic becomes aware of specific implementation details across multiple databases.

While I've seen systems that use multiple databases simultaneously, I've also seen ways to avoid this approach. There may be use cases where this is justifiable, although I haven't encountered or thought of them all. One potential reason for using multiple databases is cost reduction—specifically, when there is a need to lower operational costs, but the resources required to migrate to a better-architected system are not available. In such cases, maintaining an old database while integrating a new one may seem like a practical, albeit temporary, solution.

Final Advice

The decision of which database to use is not one to take lightly. It requires a deep understanding of your application's needs, the nature of your data, and how you intend to scale. Relational, document, and graph databases each have their strengths and limitations, and selecting the right one can significantly impact your system's performance and maintainability.

Migrating from one database model to another can be a time-consuming and challenging process, especially when large volumes of data are involved. It’s best to thoroughly evaluate your needs and validate your decision before committing to a database model.

Conclusion

Choosing the right database is not a one-size-fits-all decision. Each type of database has its unique strengths, and understanding your business requirements and technical constraints is key to making the right choice. It’s also crucial to understand the challenges of using multiple database types within a single system, as doing so can add unnecessary complexity and impact maintainability.

In the next articles, we'll dive deeper into some common challenges: Why does a current database have poor performance, and how can we fix it? We'll also explore the differences between popular database management tools—comparing MySQL to MS SQL, and DynamoDB to Cosmos DB.

Friday, 5 July 2024

What Every Junior Developer Should Know: Insights for Starting Your Career

What Fresh Software Developers Should Know Early in Their Careers

In this article, I share my perspective on what fresh software developers should understand in their early career. This viewpoint comes from my experience across several companies, and I acknowledge that not all companies have the same expectations.

Most companies outline the needed skill set in their job descriptions. So, if you receive an offer, there's no need to worry about being unqualified. Many aspects of being a good software developer, like teamwork, are difficult to evaluate in an interview. Companies also understand that you'll learn and grow over time.

Before diving into the key topics, it's important to emphasize that I mean these as practices, not just theoretical concepts. No one will ask you to define "teamwork" or a "design pattern" in your job, but they'll expect you to act based on that knowledge. The challenging part is applying these ideas when it counts—for instance, remembering to use the Factory pattern where it fits, rather than just knowing about it.

Object-Oriented Programming (OOP)

Many systems today are built using Object-Oriented Programming, and there will be even more in the future. Being familiar with OOP concepts and applying them effectively is crucial for your growth.

A helpful exercise for understanding OOP better is to draft a low-level design for a system, add a simple high-level implementation, and then evaluate: Which features make this design flawed or require refactoring? Is there duplicate code? Is the code easy for others to understand and use? Would the current design allow for 100% test coverage?

Design Patterns

For fresh developers, design patterns are perhaps the most important topic. Common patterns like Singleton or Factory are used in nearly every project, while others are less frequent. Instead of memorizing them or writing practice code just to apply them, revisit your existing projects and see how a pattern could improve things.

The way to approach design patterns is:

  1. Identify what's wrong with the current state or could go wrong in the future.
  2. Understand which design pattern can help.
  3. Figure out how to apply it.
  4. Assess if the problem is solved.

As you gain experience, you'll start to identify these patterns naturally. You shouldn't memorize them, nor should you try to reinvent them. The widely-used names and conventions are important so others can quickly understand your code.

SOLID Principles

SOLID principles are commonly used in most large projects. Not adhering to them can lead to cumbersome code, which makes adding features, fixing bugs, or conducting tests much more challenging.

Learning to identify code changes that violate these principles is crucial early on. Otherwise, your code reviews (CR) may frequently get rejected, or, worse, pass unnoticed and cause problems down the line.

Dealing with Legacy Code

Most of the time, you'll work on extending and maintaining existing code rather than building new systems from scratch. Fresh developers often can’t start a project alone because they lack experience in system architecture and design.

You’ll encounter legacy code with multiple issues while adding new features. One of the biggest mistakes is to think that fixing old code is straightforward. It's a complex process, so it’s crucial to pick your battles wisely—remember, rewriting is often the simplest answer.

Teamwork

One common pitfall is focusing solely on solo skills, often a habit developed during college by working alone or letting others do the work.

As a software developer, you'll work in a team. Teamwork means:

  • Writing code that anyone on the team can understand without asking for explanations.
  • Avoiding unnecessary changes to others' code.
  • Treating code reviews as collaborative discussions rather than simple approvals. Always ask if something isn't clear and propose alternatives where you think they're better. Reviewing other people's code helps you learn more than just reading your own work.

System Patterns

System design patterns are not expected knowledge for fresh developers, but you will be exposed to them immediately. No one will ask you to choose the right architecture as a beginner. Instead, you're expected to follow the existing patterns, and with time, contribute new projects following those patterns.

A good way to learn is to read about the patterns in your project to understand their limitations, then explore alternatives. Common system patterns include MVC, MVP, MVVM, DDD, clean architecture, and layered architecture.

Testing

Testing is a broad area, and no one expects fresh developers to master it (except in testing-focused roles). Basic knowledge of mocks, unit tests, and test coverage is enough.

Learning about testing also provides insight into other concepts like SOLID and design patterns. Sometimes, you'll understand the value of a particular approach only after trying to test different versions of the same code.

Documentation

Imagine if every SDK or API you used had no documentation—most of us would opt to write our own instead. Documentation comes in many forms beyond just a README or a PDF, but these must exist to guide users and collaborators.

Working Frameworks and Concepts (Agile, Scrum, Waterfall)

You'll encounter different delivery methods from day one—Agile, Scrum, Waterfall, etc. Although these sound complicated when you first read about them, they're simple to apply in practice. Understand the general concepts and focus on learning how your team applies them, as there are often different variations.

Conclusion

These concepts are not universally applicable in every case, but I believe they are essential for developers working in companies that value high code quality. Mastering these practices will set you on the path to becoming an effective and adaptable software developer.

Friday, 12 January 2024

Let the Docs Do the Talking

Years Ago when I was a student, I really hated anything related to the Documentation process, at that time I have a bad idea about it. In the collage days most time teams who working on the same project be like 4 people, and they are friends who having meeting and talks all the time, the projects itself was not very complex compared to real-world projects. So, documentation at that time was just paper that nobody will read. Even if someone want to know something, they will just ask the friend in front of them instead of open a poor written document and search for an answer. Maybe we were too lazy at that time to write well written document, most of the documents the students ware delivering were something like template document who you need to just fill it after coding.

Anyway, While I should admit there are things needs to be improved in the collage documentation process, but what I really dislike is the people take that attitude into the professional work.

There are many things I consider as documentation like Design document, meeting Agenda and Action items, COE, Deployment Notes, … etc. Actually, anything needed to be well-written for historical or communication purpose can be considered as document. That means even emails can be considered as document.

Why Documentation is way more important in the real-world problems?

Time Efficient

Talking about communication, it's way easier to use technical document instead of book meeting with everyone and talk to them, maybe you can do it when you have a team with ~5 members. But when the size of the team increase it will consume a lot of time, some will be on vacation, some don't really have clear context and some be busy at that time with other stuffs. So instead of forcing people to communicate with everyone you can easily send a well written document to them, it will include all context they may need. People with context will skip it when reading, while others can take their time to understand the context.

Compared to the normal chat, Document in most cases are more organized and has clearer language, also Document will avoid having many calls/pings related to any issue.

Feedback handling and Timeline Visibility  

One that I like the most is it's so easy to give feedback on a single point, highlight it for discussion, then publish the action item regarding this point, that done very smoothly during document rather than having calls. When someone added comments, you can easily see if it's fixed or not on the next version of the document.

Reference applicable

Anyone can reference to it or search it in the future, this help a lot in the timeline estimation, Requirement clarification and tracking, onboard new people, …etc.

I don't remember how many times I need to check a document to remember what exactly was the requirements. But believe me you will need it a lot when you handle many threads at the same time or even when you will check a new task.

When I should write a document?

Document is something people will need in future -even for archive purpose-, I haven't any case when people shouldn't write document, the correct question for me is: how long time people should spend writing document ?

There is one rule I follow, and I believe it's making much sense, the more important a document is the more time you should spend on it. For example, an architecture/business document that will affect many people for months or even years can take days or even weeks. An investigation document for minor incident with no really lost should take few hours, a recap for a call with external party should take few minutes and so on.

What should a good document looks like?

Here is my thought about a good document, Some apply on all document and another apply only on long document:

  • A good Title isn't enough: it's nice to have a good descriptive title, but no matter how the title is good, it Must have an introduction section defining what is the purpose of this document. One of the most annoying things for any reader is going through all the document to discover what is the goal of this document.
  • Assume People with no context: Never assume readers will have any unmentioned context, something like the current case and the motivation for this document isn't something readers will know. Remember people with context can skip this section, but people with no context can't imagine it, for example a writer can mention a number assuming the reader will have context about if this number is good or bad.
  • Define Who will read this document: defining who is the audience for this document is very important, different rules will need different way of abstraction.
  • Separate between What is critical, and what is less significant: normally any document has core critical sections and another that less essential or optional, can the reader easily catch the critical sections? Separation can be done using text formatting, orders and even separate the extra content in separate files.
  • Support the Claims/POV with facts: remember this is a technical document, something like saying X is better than Y requires an approval or justification, the document should include facts or state facts supporting the writer's POV, anything else should be avoided.
  • No-meeting mindset: one of the most common root cause for bad document is the writer always rely on the discussion meeting or someone will contact him/her if he/she does not get it. There are things not mentioned in the document but regularly have been discussed in the meeting, for me if not mentioned in the document then it should not bring to the table in the meeting. When you write a document, you should have a mindset that no one should contact you regarding this document, else your document is missing important info that should be included.
  • A Second eye is always better: there are difference between the document that is reviewed by someone and the document that does not. If the document is long enough, it's a good idea to ask someone else to review it for you before publishing it to the team/company.
  • Use domain/business standards: it's always better to use wide-usage knowledge that anyone in the field or business know instead of re-create the cycle again, something like Vocabulary, template, charts, …etc.
  • Learn from people mistakes: when you have a document that everyone mentioned it's a great document, it's highly recommended following the document style. On the other hand, when people mentioned some documents is bad, hard to understand, or misleading try to understand why people think that about it and try to avoid these mistakes.
     

Tuesday, 2 January 2024

The Art of Effective Business Communication

In today's digital age, businesses have access to a multitude of communication channels. But with so many options, reaching and engaging existing users can feel like navigating a crowded marketplace. This article delves into the power of strategic marketing communication for fostering loyalty, driving up engagement, and ultimately boosting your bottom line.

Communication in this age can mean many things include notification, emails, SMS, calls, even social media posts, home-visit, or meeting. Business communication mainly focuses on the communication when the sender is the business and the destination is the user of this business.

There are mainly 2 types of business communication:

  • Operation Communication: it's expected communication, like notify the client about the order state, renewal date, … etc.
  • Marketing Communication: it's business related reasons, for example to notify the user about new items added to the business.

While Operation Communication is straight forward, Marketing Communication on the another hand harder and need more resource.

Why Marketing Communication?

There are Major reasons about why we need Marketing communication, but the common thing between them is they are all related to one fact: until this moment, it's easier to reach the existing users than trying to find new users. There are many machine learning model who collect a lot of data about the user to make reaching to new users became much easier but until now the Click and conversion rate for most of these methods not efficient. So how we can expand our service using the user who already pay for this service?

  • Cross-selling: users who pay for one service can pay for other service.
  • Increase Total amount or frequency: users can pay with higher frequency/amount/tier.
  • Increase Retention Rate: users who paid then stop.
  • Increase Conversion Rate: users who signed up but never pay any service. 
  • User Based Marketing: users refer the application.

Communication Plan

Conversion Metrics, How Sender considers the communication as success?

The Most important thing to do is to answer the questions: How can we consider this communication as success or fail? How to measure the target metrics? What can be happened as side effect ? 

There are always metrics that business want to improve, like total transaction amount, but there are many metrics that effect this metric. And most time Sender can't hit all sub metrics in the same time, knowing that sender needs to only target smaller metrics to maximize the result in most times.

General Target likes 'payment increased' is something not very practical, Sender need to define some logical Target based on Market and historical data, and this target by its default should be hard but achievable at the same time. Additionally, when sender should measure it is also essential, should business expect the result after a couple of hours or a couple of weeks?

Lastly, business need to be aware of the side metrics, this helps in understanding if the communication has good or bad effect. For example, if communication Z improved the retention rate of users who paid for service X. but there is a side effect that the retention rate for service Y dropped. Should business consider this type of communication as success or fail?

Know The Customers

The key word in successful communication is to know all possible info about the customer, can business have good idea about the user age distribution, gender, finance state, device type, active hours, time zone, ... etc. This information will determine later many things, include how frequent business should communicate with different users, when, which language and channel it should use, ... etc. 

The more info the business have, the easier the communication will be, something like when it should communicate, the message's words, and the way of the communication, a lot of these question's answers are variable of who is the user.

Business who hasn't this information will be pushed to the corner and will just use generic approach that will never maximize their matrices.

Communication Bandwidth 

Any type of customer has a limited bandwidth, understanding the user base make The communication much more effective. For example, different ages have different resource include time, money, … etc.

No-Cost Communication Mystery  

There are many Sender that calculate the Cost of the communication by only its direct cost. So for example something like notification, emails, (and SMS for Telecom company), looks like 0 cost channel they can send as much as they can without any budget constrains.

While this in the first sight looks true, the hidden truth is the cost of the communication should never be measured only by its direct cost, But also with its direct impact. Something like keeping sending unrelated communication to the users make the future cost increase. User who used to receive many Communications will not give them attention. Actually in some time it will block this type of communication, for example user will block the notification, email, or SMS or ignore it, Spam system will filter more of these communications. Finally, the needs of Communication channel with much higher cost will be needed for larger group while it was required for only small group in the normal business.

Who should receive this communication?

Determine who to communicate, how, and when isn't an easy job at all, actually anyone can see companies with large data analysis team to answer this. Even with a lot of data, the Sender will not be 100% sure about the best way to reach the user to maximize the data.

The smaller groups make the communication much effect in most cases. Building data lake that include users profiles and their action history, training advanced models to group slimier groups, identify these groups and find Common things between them, defining the best way to communicate with them, …etc.

All this work need a lot of time and effort but in most cases, it really worth it, it gives the users the feeling this communication came from close person to him which make it harder to ignore.

A/B testing

Defining How to validate the number is also important, someone who claims that after sending notification to user, X of them tends to convert, while some say X is good or bad. My most significant question here is X compared to what? If for example People tends to convert by Y without notification and Y>X no matter how X is high it still has bad impact, and this type of communication should be eliminated.

A/B testing will answer this question, Sender will divide the target group to 2 random different groups, and only send to one group of them. Why? Because the Sender needs to avoid the external dependency as much as possible, sometimes external factors (for example a holiday, long term growth, or pricing changes) affect the whole trend much more than the communication. To make sure metrics will only measure the effect of that communication Sender needs to have 2 similar groups, comparing the metrics in both groups give the business much more Insight.

How to divide the main group to 2 similar groups? Simply by assigning people randomly from the main group to one of the 2 groups. Sender shouldn't and should never try to divide the group based on any common things.

Should the size of the first group equal the size of the last ? No, the most simple way is 50-50%, but there are many other ways like 80-20%, it mainly depends on how confidence sender is sure about the experiment, and both groups must have large group of user to ignore random error.

Should we only divide it into 2 groups? No Sender can divide it into more groups, but more groups means the experiment became more complex. Also, you need to be aware of 2 things: 

  1. Control group is mandatory : sender must have a baseline to compare with
  2. Each group size can't be very small -else random errors will give you wrong result and Sender will never be sure about these result (degree of confidence).

Should we run the test more than once? It's generally good practice to run the experiment more than once, time and events can affect the results, but make sure to adjust the sizes of the groups based on the recent results.

The results of one study -mentioned in references - proved that push notifications sent after A/B testing have a 10% higher click-through rate than those without A/B testing.

Sunday, 2 July 2023

Fresh Interviews


Lately, I had the chance to shadow and have interviews with fresh graduates, which was a great experience for me, I notice common mistakes that the interviewers keep telling the interviewee about it, and here I am trying to give the fresh graduates some generic common advice, most of them are basics, but people keep failing in them.

Why interview?

The interview isn't a simple process as there is a question, if you answer it correctly "You pass" or else "You failed", while part of that is true, but it's not as simple as that, good interviews have many things that affect the total outcome. If the company just wants to solve that problem it can just be sent, with no need for all this extra cost, so having interviews means that the company doesn't really focus on the solution, which means that you can solve the problem and got rejected or struggle with it and got hired.

A proper Interview process is hard to accomplish, having long and different interviews is tricky but when doing them right it will be guaranteed that the remaining candidate will have higher quality. The interviews cut from the employee's working hours to prepare, perform, and examine it, this process by its nature cost the company money and time to find someone that may join.

Comparing the Candidates is a tricky process as the company may have candidates with technical levels so close to each other and only one free position. What should they do in this situation? Working with a binary system (hire/no hire) means the company will pick FIFO, also that system will make it hard to have more than one interview, and when it comes to that company should handle some cases, like someone caught cheating in an interview or done very bad behavior with an interviewer, cases need to be handled smoothly in the hiring system, hire/no hire means many things.

Some Companies follow the level or the area system, in the level system each interviewee writes his outcome as (Strongly no hire, no hire, weak hire, hire, strongly hire), and each of these words is related to the hiring policy, similar to that some companies can divide the feedback into some areas (green(+1), yellow(-1), and red(-100)), and only candidate with a score greater than x will be considered.

Candidate earns and loses points without noticing, he/she thinks something is no big deal like ignoring instructions while it costs many points even if he/she manages to solve the problem, things like answering clearly about why we can't solve the problem using x give points.

Common Mistakes in Interviews

Most of these mistakes are easy to avoid after some interviews, I encourage you to get as many interviews as possible and never be down when you fail in an interview.

  1. Silent solver: you need to talk about what you think, the possible solutions, and the complexity of each of them, thinking in silence will make the interviewer give you wrong hints, and he will assume you can't solve the problem using x algorithm because you didn't mention it, and possibly you saw the problem before and want to fake thinking.
  2. Only optimal solution: giving a problem, the interviewer expects you to mention the brute force or some bad solutions first then try to improve them, not saying these solutions may mean you can't even solve this problem with a bad solution. 
  3. Ignore hints: when an interviewer gives you a hint it is for helping you, following hints should end you with a solution or give you some point, but ignoring them is a very bad indicator.
  4. Fitting solution: many fresh interviewees focus on getting an "accepted answer" based on codeforces.com or another website so they will ask something like what is the value of n, when the interviewer says "find the optimal solution" means that the best big o, if you give him an o(n*n) solution it can be a bad solution because optimal solution runs in o(log(n)).
  5. Assume things and never ask about them: some interviewers will not say everything, to see if you can find out the optimal solution for some cases or handle the corner case, for example, someone will say "number" but didn't tell you it's integer or not, some may not mention the data value range is not large (only values from 1 to 100) asking about these things give you points and may end you with a good solution.
  6. Asking the wrong questions: what is the time complexity of the optimal solution? What is the interview outcome? Where are the places you get the question from? ...etc. when interviewer ask you about your question he/she expect you to ask about the company, or the working environment.
  7. Not listening to the interviewer: things the interviewer states very clearly like "Let's move from this point" or "That is enough here" are things related to the interview process, so ignoring them and deciding to keep going is an indicator that you don't listen to the other.
  8. Zero info about the company: people can have little info about the company in the primary scanning but reaching an advanced stage without knowing the basics about the company indicates that they don't care whether it's a good or bad company, has a strong technical team or not ...etc., people who join a company with zero info about it more likely to leave it asap.
  9. Questioning the interview process: normally the interview process isn't perfect, but it's something other people put many efforts into it, so asking questions like : why did you ask me this question? Does it really matter? ...etc. is an awful thing to do.
  10. No interview preparation: always try to be in your top form before the interview, you need to read about related topics before the actual interview, relax, and push tasks after the interview.
  11. Weak CV: having a CV without your projects, GIT account, LinkedIn account ...etc. make it hard for people to know more about you from your CV and consume some time at the interview to ask about unmentioned info, check careercup.com/resume for more info.

The summary

At the end of the day the interview is just meeting with someone older than you, just be a polite person and try to show your good side, 99% of the time the interviewer will not try to make you fail or give you wrong hints, some may try to trick you to test the technical fundamentals, but the majority will keep it straight forward and try to help you.

Being rejected in an interview isn't necessarily a bad thing, ask for feedback to improve yourself and focus on the next interview, there are many reasons why you don't pass the interview so don't assume things that are out of your control.

Also, never assume that if you are good you should pass all interviews, maybe the position needs different scope or the interview process at that company isn't good enough.

Books like "Cracking the Coding Interview" really helped me do better in the interviews, also here are videos (in Arabic) that may help you (some may be Generic for all levels not only fresh) 

Finally, good luck with your next interview.

Saturday, 29 April 2023

SDKs

What are the SDKs? 

SDKs refer to the software development kit. It is generally ready-to-use code published as a package that you can install and use with a tiny effort.

There are many use cases for that, for example, when you implement a UI unit that the company will use in many projects it will be faster to push it in a package so everyone on the other projects doesn't need to implement or copy-paste it. Another example is when a company provides service for other companies it will be much faster to publish an SDK that contains all functions provided by that company rather than talk to each company, send long documents with DTOs, APIs, endpoints ...etc. and have meetings and assist each new company.

There are other package types, for example, the framework you code with is just a group of packages, or even the applications you use on your devices, but for simplicity let's talk only about the SDK.

Why? Many of the programming problems you face today someone faced yesterday. To prevent things like duplicating code on many projects, the headache of updating all code copies and to speed up the integration process between different software components.

Package versioning 

All packages contain a header to define them, the header contains useful info like who publishes it, the targeted framework ...etc. but the most important thing is the package name so you can install, define and use the package in your project, and the second thing is the package version number.

The package version number is the tricky one, you will need to understand it deeply (will know why in the dependency resolution section). As there are always new things in the software development world, any package has to keep updated with these changes. This will result in having different versions of the packages so what is the problem? Let's just give it an auto-increment id so the users know the latest version. Actually, having single digits will make the difference between versions too ambiguous, there is no guarantee that each version has compatible issues or not, is it just a small fix or a major change that we need to upgrade the framework and check our package interface. To solve these problems let's take a look at Semantic Versioning.

Semantic Versioning (SemVer)

It's a standard versioning system to define the package version, firstly it states that the package number contains three elements separated with "." in the following format: MAJOR.MINOR.PATCH, Here you can find the definition of the three elements:

  1. MAJOR version when you make incompatible API changes
  2. MINOR version when you add functionality in a backward-compatible manner (will explain later)
  3. PATCH version when you make backward-compatible bug fixes

There are many rules but the ones you will need to be aware of them:
  1. Each element contains only natural numbers without leading zeroes. 
  2. Each element MUST increase numerically. You must set the elements on their right with zeroes when increasing an element. (1.11.23 -> 2.0.0 and 2.4.23 -> 2.5.0)
  3. The contents of that version can't be modified after the versioned package has been released.
  4. A pre-release version is defined only by stating that on the rightest part using only ASCII characters. (1.0.0beta)

The incompatible changes

Sometimes you will upgrade the package version to add logs for example, these changes will have zero effect on code usage as the outer packages will not care about that implementation as long as everything runs smoothly, whatever you use the older or the newer version the outer package will not have any compatible issues.

But other times you will need to change something more critical something like an interface, or model data, in these times if the project uses an older version it will crash on the runtime saying for example (I want an interface with three functions but the package hasn't it).

So, these types of changes are stated as Major and if the project has a package with 2 major versions it will just tell you to resolve this issue before running the project and that is exactly why Dependency resolution.

Dependency resolution

The former definition of it is the process of finding and installing the correct version of a package and its dependencies. before diving deep let's show a complex state that will highlight the problem 

Normally packages can use other packages. but there is no guarantee that all packages will use the latest package version. Imagine you publish package A and then package B, and package B use package A with version 1.0.0, lately, you needed to update package A to have more features to use in package C, and package C use package B at the same time, the question here what will be package A version?

You can check the following diagram for another example:

Most of the time the package version graph contains these samples:

  • ">=x" means Developer doesn't care which version you use as long as it is greater than x and less than the next major change, generally speaking: the package manager gives you a warning for incompatible versions.
  • "=x" means you can only use version x else you need to update the package even for tiny changes. This style has limited use cases; generally, don't use it unless you need to.
  • "=x.*" means use the latest stable version with the prefix package number x. This option will remove the headache of manually upgrading all other packages and repo each time you release a minor or patch change, this also can be customized to accept only patch changes.

There are some common rules here:

Lowest applicable version: that state that the project will use only the version that fits with the highest package version asked without upgrading to the latest unless using float point 

For example, if packages ask for package X with versions (1.0.0, 1.2.10, and 1.2.1) the project will select version 1.2.10 in the whole project even if package X has a newer version (for example 1.3.0).

Direct dependency wins: When the package graph for an application contains different versions of a package in the same subgraph and one of those versions is a direct dependency in that subgraph, that version would be chosen for that subgraph and the rest will be ignored. 

Sometimes this will result in forcing an update, for example, if you use package Y that uses package X with version 2.0.0 but you told the project to build with package X =3.0.0 or =1.0.0 the project will ask you to solve to that version 2.0.0 or go to the package Y and upgrade package X.

Other rules depend on your use case and the framework and I want to keep it as simple as possible.

Resource:

Semantic Versioning 2.0.0 | Semantic Versioning (semver.org)

NuGet Package Dependency Resolution | Microsoft Learn

Monday, 20 February 2023

Assume Breach

Assume Breach is a mindset, mentality, or group of questions that target minimizing the damage of any cyber-attack if happened.

The original mindset for security is to prevent any outer attack, on another side, AB looks at a cyber-attack from a different angle, it tells the programmer the attack will happen and somehow manage to go through the outer security layers of the organization then ask a very important question: what will be the damage in that case?

Some will think that we should just make sure that cyber-attack will not go through layers instead of assuming it will happen, but according to cyber-attack history even when an organization adds many strong layers of cyber security the attack somehow can go through them.

So, AB asked the security of the organization, assuming someone managed to go through the security, what damage will the organization take? this question generates many questions like:

  • What is the critical info he will be able to reach?
  • How much time it takes to remove the damage?
  • Is the access and permission given to each one reasonable? 
  • What will happen when someone gets permission? 
  • When will the organization discover the attack?
  • How can an organization mentor the critical info and permission? 
  • Are the organization have a good policy that keeps track of actions and permissions?
  • Can the organization detect strange actions and eliminate their reasons?
Nowadays there are many third-party tools that help organizations and make these things much easier but the challenge here is that small companies don't think it is worth investigating in these tools at the beginning and as time passes the introduction to these things became much harder.

Database Decisions: Choosing Between Relational, Document, and Graph Models for Your System

Choosing the right database is one of the most critical decisions in system architecture. Whether you're dealing with structured or unst...