Friday, 21 October 2022

Software monitoring

How to Debug Production Code?

In real-life applications most of the time debugging will not be an option, developers will need data about what happened, when, and where, debugging an application with many flows and states will take forever until found a bug, also sometimes these bugs fixes contain non-technical part (for example refund for orders that failed to deliver).
Application monitoring is divided into two parts: server-side and client-side.

Server-side monitoring

Mainly it is the most essential monitoring you will need to handle because it contains all critical things like databases, security methods, business logic ... etc. there are different types of monitoring: 

Infrastructure monitoring: Target database health, hardware usage, service/micro-service health ... etc., if something happened to these things it will affect the whole business.

Also, it targets some business concepts like response time and the number of requests/secs to certain APIs, which helps in understanding if the application needs to scale up, or if there is a problem that prevents the users from calling APIs.

These incidents are usually triggered by time (application scale, old hardware that needs to change ...etc.), but in rare cases, the problem root could be code or business changes (code reviews and tests should prevent any change that triggers these events).

High Priority incident:  Most applications have invalid states that should be unreachable (like request pay without sign-in, signing without signup, trying to insert a null value at a non-nullable column ... etc.) if any of these states became reachable for any reason it needs to fix immediately. The root of these incidents is recent code changes or bad design that didn't handle this state.

Event Logging

Event Logging is mainly used for debugging purposes to understand what happened, Logging info is stored differently than the normal data as it is much larger, it will be written once (no update operations), and contains more technical information that later (in many cases) will never be used in business, later on, these data will be moved to storage with high latency (an archive) and some applications models will delete these data.

Auto logging: most projects define a framework for auto logging, it provides the developer with standards info like API x called with parameter y, or API a returns result b, while the developer sometimes needs more specific logs, the auto login has a big advantage as it all has the same format style and it doesn't rely on the developer memory (remember to add logs).

The only thing developer needs to be aware of for logging (logging in general) is to never log sensitive data, for example, imagine an application that logs logging info (password) or payment info, while the logging info can be accessed by anyone in the company or the technical team.

Client-side monitoring

Application Monitoring is mainly used to detect a problem ASAP, keep track of what is happening in real usage and understand how the users interact with the app.

While most data collected on the server side is for technical purpose and target only the developers, the client-side data target many teams (developer, product, data analytics, design ...etc) which means this info needs to be written as clearly as possible, assuming someone who never wrote a single line of code will look to these data is the normal case here.

One thing to keep in mind is that any data that come from the client side isn't fully trusted as anyone can edit the APK and send fake data, so, for example, something like if the user paid for the service or not is something you will always look into server-side monitoring data.

Application instrumentation is the process of adding code to your application so you can understand its inner state. It measures what code is doing when it responds to requests by collecting data such as metrics, events, logs ...etc. An instrumented application tracks as much information as possible about the service’s operations and behavior. It provides more detail about what is happening, so you can see relationships between requests. 

Instrumentation is key to ensuring the application is performing up to its best. It focuses more on the whole picture rather than single info, it targets to produce something like flow, relations, charts ... etc. rather than user x sending request y with parameter {a,b,c}.

Friday, 14 October 2022

Design Principles: Encapsulation

I love to talk about one important concept that is easy to apply but greatly improves code readability and makes the code easy to manipulate in future Encapsulation.

The Encapsulation design Principal idea is simple: don't mix dynamic code with static code.

Let's say an example to understand that: imagine you write a code for an online store that sells 3 types of t-shirts, in this example, something like system order flow will be something that will not change frequently while the number of products, their names, and other things that related to the products is highly likely to change by time, so when you tightly coupled them with other logic like the order flow that shouldn't change easily will make the code so hard to manipulation in future or even to understand.

Many fall into a common misunderstanding here (the fake scaling view): in our example, a developer will think "Oh it's just used in one place and it's only 2 or 3 lines" or "It's only duplicated once", take a look at the business model for a minute and you will realize that 3 can grow to hundreds or thousands in few years then imagine a function that handled all these cases and does other business logic and all these cases are repeated multi times in different places in a single class, it will be a nightmare to any developer to try to manipulate or debug these thousands of lines.

Some may hear about over-killing and think it's alright for now to ignore that and push this concept after the product growth, but unlike something like microservices, the Encapsulation design Principle doesn't add complexity to the code it just told the developer to export dynamic code outside the logic, and a long side with single-responsibility design Principle they told the developer to break the logic into small code blocks as much as possible. 

Ignoring this concept will result in a code with a high chance of bugs in the future for two reasons:

  • The next developer who wants to manipulate the code will not understand the difference between static code (that shouldn't be changed without deep thought and proper design discussion) and dynamic code that changes frequently (the solution is to move to DB if possible).
  • Dynamic code usually exists in multi-places, in most cases that will result in changing in some places and forgetting the rest, and no one will notice that (except the person who wrote the original code), which results in many times inconstant behaviors.

This Principle is also related to other concepts like the (DRY) Principle "Don't repeat yourself" which mainly told the developer to avoid duplicate code.

One of the old great pieces of advice I heard when I was a student is: "When a developer writes a function that exceeds ~15 lines (this value is related to the language) he/she probably can divide it into 2 or more function" while this can be related more to the single-responsibility design Principal concept however understanding how to put the breakpoints in your code like "these 2 blocks of code can/should be separated" is something that also related to Encapsulation Design Principle.

Saturday, 8 October 2022

Biometric Authentication


What is biometric authentication and how does it store my data?

Let's talk about one of the recent sign-in methods that became popular recently, the Biometric login which includes fingerprint, face id, voice recognition, and other methods, but mainly let's talk about the most secure one of them, the fingerprint method.

Why are we talking about fingerprints, not face IDs or other methods?

Mainly the other methods have weakling that to recognize the real user vs a digital copy, you can have any person's voice from online videos or records, you also can get almost any person's face from internet, the sign-in system can ask the user for specific things like making a random specific reaction or say the displayed sentence, however, someone still can break this using train ML model that talks like any person or display any face reaction using public data.

The fingerprint is the only thing that proves it's the real user, but if we use fingerprint doesn't that mean fingerprint will be soon shared data if leaked?

Actually, how fingerprint works are really interesting as it's too similar to the hashed password concept but device-based instead of server-based, what happens these days is there are no public devices that store fingerprint as raw data, all devices hash the fingerprint using a unique key before storing it so if a device got hacked there is no fingerprint to steal, and the hashed value is useless as it is different based on device and the device itself doesn't know what was the real value.

For the password when the company feels it's leaked, they will ask the user to change it, the fingerprint will not have this option, so fingerprint leaking will be something that can never be recovered for the user.

So how does the online business use a fingerprint without knowing the fingerprint of each user?

What happened is the device contains most of the time a token generator and when the device verifies the fingerprint it will just return a new token to use instead of the hashed value, Biometric auth is just a way to verify the user who is using the device right now, there are also applications for auth devices that only activate using Biometric auth.

This explains why the user needs to log in first on the device to enable Biometric to auth on each new device even if the user already enabled it on other devices, Biometric auth support only one user on the device for now because the applications do not have access to hashed values it just asks device and the device return true or false (or auth token if it's auth device).

image references: [link1]

Friday, 23 September 2022

Third Party Authentication

In the last article, we talked about what is Authentication Tokens, in this article, we talk about one of the most popular applications that use tokens: third-party sign-up/login.

Third-Party Authentication

What is Third-party authentication? Well, a simple example is when the user signs in to an account at application x using Google, which means instead of application x authenticating the user, it will just ask Google to authenticate the user and then tell the application who the user is. There are three main players involved in the social login process:

  • The User that wants to access an application.
  • The Application that wants to identify a user and get related information.
  • The Authorization provider confirms the user identity and provides the needed data.

One of the motivations are smaller business most of the time can't afford strong authentication service, also the user wants a faster and more secure Login experience, and the third party can get valuable data about the user from these requests, so it will be like win-all situation, so the advantages recap is:

  • The authentication responsibility moved to a stronger authentication system. 

  • The user doesn't need to enter information that already exists somewhere else, also if the user wants to change his info it will auto-update all linked websites (if they fetch the data each time).

  • There will be no need to verify some data (like the email when the user uses Google)

  • The password will not be reused in a multi-application so the user will worry less about leaking the password as the user enters it less often.

  • The user has control of which info is to be shared and has access to display and disable third-party sign process from the original account. 

The disadvantage is that third-party accounts will have wide access to a collection of accounts so hacking the third-party authentication be more valuable as time passes, it mainly acts like a password manager but with the ability to deactivate any account.

The other disadvantage is the application will have limits based on the third-party restriction, privacy policy ... etc., which may affect the sign-up process and add more steps that make the third-party option not very practical.

How is third-party authentication work?

Third-party authentication is only possible using tokens, no one will feel secure if random applications asked him/her about important account login credentials. 

The sign-up and sign-in flow are very similar: the application will redirect the user to the third-party website with some info sent (like the identity of the application and the access permission (scope) the application needs, and the third-party verifies the user identity and gave the user in the call back a code, this code will let the application to request a token with the mentioned scope, the code will work only for this application and will give it a token to fetch only the scope displayed to that user, finally using this token the application will fetch these data and insert it into their database (if needed). 

There may be a mini adjustment on the flow based on the application, some application wants to add a more secure flow, and other application will need more data that can't be fetched from the third party (missing or hidden).

The type of token will be mainly an id token or access token with limited scope based on the application.

One of the problems here happened when the application tries to integrate with different third-party services, the application will need to implement each one of them based on the provider and this sometimes results in many duplicate efforts across the small applications, one of the suggested solutions is instead of integrate directly with each one (google, Facebook ...etc.) let's integrate instead with a provider that already integrated with them (like Auth0, Firebase, ...etc.).

Here is an example of OAuth2 authorization, you can check different third-party providers and notice there is a slight difference between each one.

c# - How to create custom login provider like third party login provider? -  Stack Overflow




Sunday, 18 September 2022

The Success Secret

 What is Success? - Kenzie Academy

This small article is about a hot topic that has many discussions in the current world "The way to success".

For the last few months, I skip several books that talk about different topics (most of them have many recommendations and are labeled as "best-selling in XXXX"), there are multi reasons that make me decide to skip them like poor writing skills, filling the book with random things not related to the book topic ... etc., one of them is the authors used annoying patterns for me, he just throws some talk to you like it's known facts, these facts aren't based on experiment or proven knowledge, only based on the author's POV and many of times this POV is based on things like "Survivorship bias (or survivor bias)" and "The holy success"

Survivorship bias

This term means looking only at the survivors instead of looking at the whole group, this bias is one of the most famous reasons that result in false results on the surveys, and beliefs and result in wrong decisions in the future.

Many speakers marketing their self by the mention "X attended my courses" or "There are Y people who attended then became successful persons", They never mention the thousands who attends then had trouble failing.

I remember a few years ago one of the lecturers (I love to hear) said that if you pick a random group (with the same ideology and business) most of the time it will have about the same distribution of success and failure as a group attends to the "success course".

That doesn't mean success is totally just a random thing or that self-improvement is a myth, what I want to say is doing x to improve y is a relationship that needs a lot of experiments and surveys to prove it, not a random guy told you it is proven to be true because he believes that or he tried it.

It worked for me

Sometimes experiments to measure the relation between two variables consume millions of dollars and much time, the most critical thing in most of these experiments is not the measurement process, it's how to find out all other variables that could affect this relationship and make sure they will not change during the experiment, even after doing this step they need to repeat the experiment multi-time to make sure the result will be the same.

It's a foolish idea that a single experiment is enough after said the new facts the author will face many cases that said this isn't true fact, and the answer is known to the author "That's because they didn't do it in the right way" that "right way" will be nothing like clear steps or set of rules it will be just bunch of words in the author's mind and probably each time asked about it will say the cloudy answer that no one will understand the exact meaning with the sentence "if you understand what I said you will succeed".

This point isn't against sharing the self-improvement experiment but tying the action with results is something that isn't reliable, imagine someone trained for 4 years and before the Olympiad, he had an accident and result he couldn't contest does that mean all that he did was wrong? imagine a person who did nothing in his entire life except sleep, eat, and play however he managed to gain millions from his rich father, does that mean doing the same will make you gain millions?

The survey approval

While there are many times you hear someone talk about a survey that no matter how much you search you will never find, there are other types of a survey when the details of that survey (like the selected groups, units, other variables ...etc.) you can't find, it's an easy thing to go to a school for rich families and go out with a survey that said the average pocket money for school student is 100$ per day, or go to a village where most women don't work and said that men have 30K$ more salary than women.

The one who does the survey is human which means he/she can be biased, lair, or stupid, normally that applies also to organizations, even when someone wants to get the truth, some things could affect the result without noticing, and the same people can answer the same questions different answer based on the order of questions, writing format, environment, time ... etc.

The holy success packages

When talking about success many people have a really strong bias toward the survivors, which results in many cases following each step of them without thinking about if it's logical or not, if someone does x,y,z and became the most successful person in the people's POV, most of them will just think let's do x,y,z regards if x,y,z is good or bad.

For them, it's an illegal thing to think x was a minor factor of success while y does not, and z was the most critical factor in the process.

I am not saying that I know the success way All I want to say is if success was an easy thing that anyone can achieve through reading a book that has a bad writing style then it will be different than success in my mind.

the image reference : [https://kenzie.snhu.edu/blog/what-is-success/]

Thursday, 8 September 2022

Authentication Tokens


Token-based Authentication ✋ - Definition, Types, Pros and cons

In the last Article, I discussed the password method to identify the user, before diving deep into the other methods I want to discuss the authentication token.

After the user login and the system recognize him, it will be bad practice to send the login credentials (password) at each request as storing and sending sensitive data frequently is something the server should avoid also that makes each API do the login process at the back-end to confirm that's is real user, and it will be impossible if the sign-in method changed periodically (like OTP or 2FA). 2 methods that handle this story here: Session and token.

Session:

This method focuses on solving the problem that the browser needs to store the password and send it each time to the server, in this method the server creates a new session and adds the user data(ID, IP, browser, action time ... etc.) to it and return the session id to the user and will be stored in the cookies, there are some problems with that method one of them is the session data stored in the database that means each request will make an extra call to the database also that will be a way to attack database bottleneck.

However, the session method is still used worldwide, the advantages of this method are it's simple to understand and implement, can track the user action easily in one place, and also the server can disable the session just by deleting its row from the database.

Another method I want to discuss today is the token, the server will return the token to the user, and the user sends it each time with the request and as long as this token is valid the server will not ask the user to log in again.

What is the authentication token?

Token generally is an authentication string that contains some info and is encrypted in a way that makes it impossible for the third party to modify it and still valid, that means anyone can't just simply change the user id and use the token as someone else.

An authentication token is formed of three key components: the header, payload, and signature:
  • The header: defines the token type being used, as well as the signing algorithm involved.
  • The payload: is responsible for defining the token issuer and the token’s expiration details. It also provides information about the user plus other metadata.
  • The signature: verifies the authenticity of a message and that a message has not changed while in transit.

The big advantage here is that the server doesn't need to store any data related to the token in the database, it just validates the token to make sure no one changes the message. another advantage is it enables giving different scopes to the different users, which means it supports something like a third-party sign where the user wants to share only parts of their information with some apps without giving the app the ability to change their data, post something, or see their payment methods.

The additional data can be so useful as the server can add data that can be needed in some cases like user type that determine if the user is authorized to do any action or not these data should be static and don't change frequently the token life span in normal cases, if it's not it shouldn't be included in the token and that was one of the main weak points of the token that if the server wants to change anything about the token the server can't do anything except send a new token to the user and hope there is no one use the old token until it expires or adding database that handles this point. 

The most weakling point the server should be aware of is there is still a possibility that someone steals the token from the device and pretend he is the user, but the providers are aware of that, some give the user an option to disable the token if needed, and some just provide the users with short-life token so if anyone steals it the damage will be limited (this can explain why to change password require to re-enter the password while forgot password not).

The problem of "How to disable the token" is a bit interesting, when the server makes the token include all things it needs (User ID, User Type, Access Type-scope-...etc) and anything changes for example like the user is blocked or the user wants to destroy this token because the user doesn't need it anymore, the server will have 2 option to handle these things:

  1. Move any data that may change like user type out of the token responsibility to the database responsibility that will represent the need to check the database problem but this can be limited if the server handles it by checking these parameters at only critical APIs like payment and changing user data, that means that the server limited the token instead of disabling it as the someone can view data using it so if view the data is important to the server can't use this method, mainly it's a trade-off that dependence on the application.
  2. Make the token expire too early so if someone has a way to get the token, it becomes expired quickly, however making the token life too short is something that will add overload on the token generator and the user will be unhappy when logging frequently, that can be solved by having constraints inside the application itself for payment and security changes and instead of replay with short life token to the login credentials replay with a refresh token.

There are different types of tokens depending on the expiration data and the data provided with the token let's mention some here:

  1. Access Tokens: these are credentials used to access protected resources, and are used as bearer tokens. the access token allows the user to access and do actions based on its scope, mainly it was supposed to be a short-life token that expires after some minutes or a few hours but you can see some application that uses access tokens those last months that mainly depends on how they handle the token entirely.
  2. ID Tokens: these are proof that the user has been authenticated. This token mainly carries the user info and is used to verify the identity and never give any access to any protected resources. This is also a short-life token that mainly lasts for a few hours.
  3. Refresh Tokens: these are long-life tokens that are mainly used to generate new tokens of the other types, it solves the problem that the user should re-login when the token is expired, but of course, these tokens are tracked and the server can check if they are disabled or not as these will be used away fewer than the rest tokens, these tokens work as a hashed password except it will expire after months and will grant its user tokens that may have limited scope.

Notes:

I talked mainly about online tokens but there are other types of tokens like hardware tokens where the token is stored on a specified device, the 2FA is a type of token called Disconnected Tokens but we will discuss it in a separate article.

If you are interested in how a signature is made and why it's impossible to edit the message without affecting the signature you can read more about Public-key cryptography as a start.

Image reference: https://www.wallarm.com/what/token-based-authentication


Friday, 2 September 2022

Account Security using Password


Password Security Guidelines: Everything You Need to Know | SpyCloud 

In this article, I will talk about the first and the oldest security method, the password as this article will be part of a series of articles about account security methods and why we hear the word "Password-less" more often lately.

Talking about account security will take us to the “Password-less” topic, a lot of big tech companies put many efforts into this topic these days, in the old times account security targeted mainly the admins and employees, as there was no real value from stealing random emails with nothing attached to it, however, account security importance increases each day as there are many websites that support the online purchase and there are accounts that contain payment cards and other things.

Let’s first know how the signup/sign-in process is done. there are three models in that process: user, network, and server, mainly the network will be out of scope in our discussion but there are security methods supported by the network system to protect the user from a man standing in the middle of the connection between users and servers and steals users’ info.

In the past, there is few methods to sign in and the most popular among them is the password, the applications stored passwords in the database, and for each login request, they compare the given password with the stored password, which was bad practice because if someone gets into the database, he can see all passwords and can reuse them at this website or other websites.

Currently, most companies rely on the fact that the user has only one password, and they only need to know if it matches the input text or not, so they start to use hashing on the passwords before storing them in the database when login requests received hash the given text and compare it with the hashed string in the database, maybe someone sees there are no different however this step will make the database doesn’t hold the password value as the hashed strings don’t hold any value and can't undo it to the original password so you eliminate one of the big sources the password can the leak from it.

So, the problem is solved why password-less? let’s talk first about why the password isn’t a strong method as the user imagined, here are the most reasons passwords can’t scale as an account security method:

  • Constant: most applications let the password be constant (for years) each day passes with the same password making the account easier to be hacked.

  • Personal based: that means it mainly depends on the user's mind and personal info, a user with no security background (which is the majority) will choose an easy-to-remember password which will be easy to guess as well, the problem here is people don’t notice they construct a password with their public information that anyone can get online (name and birth date).

  • Constraints: when websites don’t force constraints on passwords some users set very weak passwords, however adding constraints didn’t improve the overall result, it made the password more predictable by adding constraints like must have capital letter result for the majority of passwords to have the first letter capital, also one important point here is password constraints are visible for all so hacker also knows that password constraints that all password has.

  • One for all: the majority of users use the same password for everything on the internet, they use the same password for email, social accounts, bank accounts, online shopping website, and some random websites with 0 security (websites that sends username-password pair as a parameter in the URL at login), most of the hackers know that as well so getting email-password pairs from random website worth an effort as these pairs can be used on other websites.

  • Not random: most people think that brute force password is hard because it’s a random string, most cases didn’t apply, the user will type a password in English, mostly the first letter will be capitalized, and the letters construct a name or a word … etc, thinking carefully about these facts will make the person realize that brute force on password isn’t that hard.

  • Common: there are common passwords that many users use like “Monkey”, “welcome”, “password” …etc,  and there are also famous patterns like Name +’@’ + birth year.

  • Attacks became stronger: after each attack, the hackers know more about the passwords patterns and develop more powerful tools that become more accurate to guess the passwords and each new generation of devices gives the hacker more computing power to guess the passwords and run stronger tools.

  • Hard to fix from the server side without side effects: trying to handle any of the above points from the server side will have a high chance to decrease the signup convert rate which hit the online business growth, also by forcing Users to use strong password will increase “forgot password” traffic rapidly.

Some people use a method that each group of accounts has one common password (group accounts based on security levels or the importance of these accounts), but this still holds the One for all problem, breaking into one account gives the hackers access to other accounts, add to that the fact that most of the time these different passwords have common patterns, another solution is a password generator which provides the user with a new randomly generated password for each account, but the password generator has many problems:

  • The access method is still non-random generated passwords means you only group the above problems into one big problem.

  • It saves all passwords in one place with no hashing, which means any hack into it will be valuable.

  • It will not be reliable to ask users to change passwords for 100 or more accounts when someone successfully hacks their password manger.

  • Password generators will not solve user behaviors, such as sharing a password and logging on to random devices, and Most users will not use them to their full potential.

  • Missing advanced features that help users be more flexible, which means you need to use it only on personal devices and don't have the option to login into a temporary device and then mark it as a strange device (will talk about it more when talking about the break detection).

Still, the password generator will be a better solution for websites with less value to the user (doesn't contain payment or personal contact info that may lead to a hack chain) and you will not be worried about low-security accounts if the password leak from the server side.

I will stop here for this article so it is still short and easy to read, will discuss in the next article what are the other Authentication methods.

You can read more about anti-pattern passwords from this paper: [password-guidance]

You can read more about how easy to hack using password patterns: [Choosing Secure Passwords]

Image reference: https://spycloud.com/solutions/password-security/





Technical Design Document

In software development, we often move fast—especially in agile environments. But moving fast doesn’t mean skipping structure. One of the mo...