Thursday, 25 August 2022

Competitive Programming

 

One of the old topics I wanted to talk about is Algorithms, first, let's define some words to be on the same page
  • Algorithms and Data structure: both are knowledge based on Mathematics and computer science, and both give us abstract solutions.
  • Problem-Solving: it's how to use your knowledge and apply it to real problems to get the best solution.
  • Competitive programming: it's an activity where the participants compute the largest number of accepted solutions for different problems during a limited time.
Let me tell you a story, there are four students Joe, Lee, Ana, and Mao.
  • Joe decides he doesn't need to learn algorithms and data structure at all as there are packages somewhere that can do the job for him, and there are other things he wants to learn.
  • Lee knows that Algorithms and data structure are important so he decided to read more about this science but never used this science to solve real problems.
  • Ana thinks it will be useless when to search for an answer (Algorithms and data structure) without knowing the question (Real Problems) so she prefers to learn more about problem-solving and how we use this science to solve our daily problems.
  • Mao thinks that just solving Real problems isn't enough and he needs more challenges so he decides to join a competitive programming student community and continuously join online programming contests.

Until this point, most people will have some ideas about them, some will think X is a better programmer or Y will not be good as an engineer in the future.

From my point of view, there is no right and wrong here, there are developers that work most of the time on a framework that did most of the optimization job so Joe will spend years without using a single algorithm, Lee needs to know general concept only for research purpose, Ana wants to join a job that focuses on designing architecture and delivering high-quality code in a reasonable time, Mao wants to join a competitive environment where there are a ton of request each second and everything needs to implemented faster and efficient.

As I joined competitive programming for some years, I want to say the next few lines don't mean it's bad, I only want to mention some points some don't understand early.

There is no accepted solution to real-world problems, mainly there is a style of thinking in the contest when seeing constraints like (1<n<10e4), many competitors will think about a solution with a complexity o(n^2) but in the real-world problem you don't have these constraints, you just have a problem that needs the best solution.

Clean code vs optimized code: something like bit-wise, an array of max size, global variable and more are good examples of how trying to optimize (or write code faster) sometimes ends with you writing code that has a bunch of bad practice things and never can be used in a production code, of course, there are applications (like a function that has been calling thousands of time each second) that needs all possible optimizations but that isn't the general case and applied only on specific functions.

There is a hidden coefficient (let's name it H): it is something you will need to be aware of, H mainly refers to how many times this code will run, some punch of code will run thousands of times in less than a second while there is another code that runs once a month and there is code that written to run only once forever, knowing H will determine how the code should be optimized, some problem even brute force solutions will do a great job, and there are some cases were less optimized code that easier to implement and understand is better than complex code that no one understands and it will run 1 minute faster per month, you can determine this H by understanding the core business and the possible scaling in future.

When it comes to calculating complexity, most programmers will think code X is better than code Y because X has o(n) time complexity and Y has o(n^2) time complexity which will be true under these conditions:
  • n is expected to grow into a large number.
  • the worst case will happen in the real world.
  • They use the same amount of memory.

As an example, if you know that n will never exceed 10 or 100 (like the number of letters, levels of seniority ...etc.) there are high chance there is no real difference or even worse sometimes Y is better than X; because Y time has small coefficient (uses light operation and run only once) and never go to the worst case (n^2) while X has very large coefficient (uses heavy operation like division, runs many times, and each time run to the worst case), also Y consume o(c) memory while X consume o(n).

Sometimes the worst-case complexity will be impossible to happen in real life like assuming random numbers when you take real validated numbers, one of the common examples is the nested 4 loops

where the complexity for sure is o(n^4) the actual time for small n (like 100) will be so close to o(n^3). Another example is when you run for loop on n but you know for sure more than 98% of the time the for loop will stop after one or two iterations and it will be a rare case to even reach n/10.

Another example is when iterating over the children for each user and you know that the mean number of children per user is x and the max number will never exceed y at the project market, even if this number changes after 5 or 10 years the change will be small at most cases, also it's not good practice to see that number has decreasing trend each year and assume the number will grow 5x.there is a difference between handling unexpected inputs through the code and assuming impossible cases.

Also, there are these assumptions that don't make sense, like assuming all customer has the same age, there will be 1e8 user login to the same server at the same time or all shop departments get the same sales each day, and for sure spending, sometimes looking at the business and the world statics will give you more value than just writes a code that assumes the impossible cases.

I won't say don't assume the worst but there are many cases where the worst cases never happen. You must handle it only to protect the business from cyber-attacks not to assume it is the normal case. 

There is the memory-time trade-off that needs to give more thought, there is a popular idea that memory doesn't matter as long as it fits, in real life, there are cases when you prefer a slower process over huge memory waste; take installation as an example, imagine you start installing a program then you find the system frozen and the whole memory is taken by the installation process so it finishes after 20 minutes instead of 30 minutes while there is a group who preferred faster installation the majority of the user will prefer 50% time more while they can use their system normally.

In the end, All this talk doesn't make competitive programming a bad thing, it just reminds people that there are things they may skip, also I hope people do it with friends and have more fun rather than doing it to (prove they are better) or to join a strong company, I have friends that join ACM-ICPC student community for 4 years and another that didn't and many of both groups become great programmers and join a strong company (or go to research path).

No comments:

Post a Comment

Database Decisions: Choosing Between Relational, Document, and Graph Models for Your System

Choosing the right database is one of the most critical decisions in system architecture. Whether you're dealing with structured or unst...