Concurrency in Web Development

What is Concurrency?

In computer science, concurrency refers to the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome.
(wikipedia.org)

What is Concurrency in operating systems?

Process management in operating systems can be classified broadly into three
categories:
- Multi-programming involves multiple processes on a system with a single processor.
- Multi-processing involves multiple processes on a system with multiple processors.
- Distributed processing involves multiple processes on multiple systems.
All of these involve cooperation, competition, and communication between processes that either run simultaneously or are interleaved in arbitrary ways to give the appearance of running simultaneously.
Concurrent processing is thus central to operating systems and their design.
(teaching.csse.uwa.edu.au)

What is difference between multi-threading and multi-processing?

The key difference between multi-processing and multi-threading is that multi-processing allows a system to have more than two CPUs added to the system whereas multi-threading lets a process generate multiple threads to a CPU to increase the computing speed of a system.
(techdifferences.com)

What is Concurrency in programming language?

Concurrency is when two tasks overlap in execution. In programming, these situations are encountered: When two processes are assigned to different cores on a machine by the kernel, and both cores execute the process instructions at the same time.
(quora.com)

- Concurrency in very simple terms means that two or more processes (or threads) run together, but not at the same time. Only one process executes at once.
- Parallelism on the other hand means that the processes (or threads) run in parallel (surprise surprise); meaning they start at the same time and execute alongside each other at the same time.
(medium.com)

Node.js has good concurrency support where Django's concurrency support is severely limited by the limitations of Python.

Concurrency with Node.js

JavaScript has a concurrency model based on an "event loop". This model is quite different from models in other languages like C and Java. A JavaScript runtime uses a message queue, which is a list of messages to be processed. Each message has an associated function which gets called in order to handle the message. At some point during the event loop, the runtime starts handling the messages on the queue, starting with the oldest one.

The application environment (ie. your code) only has access to a single thread, but Node.js transparently hands off the IO to a separate thread without the user needing to deal with it.

According to nodejs.org,- “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.”

This new way of performing or serving to the web requests or any other type of server request is called Non-blocking IO Operation.

- What is the event loop in node JS?

Reality. There is only one thread that executes JavaScript code and this is the thread where the event loop is running. The execution of callbacks (know that every user land code in a running Node.js application is a callback) is done by the event loop

- What role does asynchronous programming play in concurrency?

Asynchronous programming is a means of parallel programming in which a unit of work runs separately from the main application thread and notifies the calling thread of its completion, failure or progress.

- Non-blocking I/O

For many other programming languages, invoking an I/O operation halts the entire process. It has to wait for the result before moving to the next line of the code. In Node.js, the APIs around I/O functions are very different from those traditional APIs. Instead of taking in parameters and spitting out results, I/O functions in Node.js are famous for not returning anything meaningful. And your code will proceed to the next line as if the I/O function call never happened.
Who’s taking care of the I/O operation? Simple, another thread. But Node.js is said to be single-threaded! Yes & No. Node.js is single-threaded for YOUR code (including all those callbacks). Each time you request an I/O operation, Node.js spawns a new thread for it and hands over the results to your thread via event loop.

This is what non-blocking I/O means: it runs I/O operations in parallel with your code.

- Concurrency with Async/Await and Promises

Promises do exactly what they are named after, they promise the eventual completion (or failure) of an asynchronous block of code — basically, it gives us an escape route from the callback hell to write much cleaner code -- which seems to run line-by-line as synchronous code we’re used to seeing but actually runs asynchronously behind the scenes.

Async and Await, These bad boys are all about making async code even more readable and maintainable — so much so than chained promises. These allow us to write asynchronous code that is synonymous to “normal” sync (blocking) code we’d usually see in syntax in other programming languages.

* await can ‘wait’ for a async function to resolve or reject a value and allows us to handle the return value without using .then() or .catch()
* async is a special keyword attached to function declarations to let the Node interpreter know that it’s an asynchronous function.
e.g. async function foo(){...} or async () => {...}
* Important! await can only be used inside an async function.
(stackoverflow.com & javacodegeeks.com & blog.battlefy.com & developer.mozilla.org)

Concurrency with Django

- What is a thread in Python?

A thread is a lightweight process or task. A thread is one way to add concurrency to your programs. If your Python application is using multiple threads and you look at the processes running on your OS, you would only see a single entry for your script even though it is running multiple threads.

- What's the difference between Python threading and multi-processing?

With threading, concurrency is achieved using multiple threads, but due to the GIL only one thread can be running at a time. In multi-processing, the original process is forked process into multiple child processes bypassing the GIL. Each child process will have a copy of the entire program's memory.

- How are Python multi-threading and multi-processing related?

Both multi-threading and multi-processing allow Python code to run concurrently. Only multi-processing will allow your code to be truly parallel. However, if your code is IO-heavy (like HTTP requests), then multi-threading will still probably speed up your code.

- Recap?

Python multi-threading is limited to having one thread at a time accessing the Python interpreter. This means that a Python program can only make good use of multi-threading if most time is spent waiting for I/O, or if most CPU time is spent inside thread-capable libraries written in e.g. C or C++. You can combine multi-processing and multi-threading by asking the web server (e.g. Nginx or Apache) in front of your web app to run the web app in four separate processes, one per core.

- What does this mean for Python Concurrency?

Lets say that we have a python script that must download 91 images sequentially from Imgur’s API and it takes 19.4 seconds.

Threading is one of the most well-known approaches to attaining Python concurrency and parallelism.
Lets create a pool of eight threads, making a total of nine threads including the main thread. I chose eight worker threads because my computer has eight CPU cores and one worker thread per core seemed a good number for how many threads to run at once.

* Create a class with Threading:
from threading import Thread
class Downloader(Thread):
    def __init__(self, queue):
        Thread.__init__(self)
        self.queue = queue
    def run(self):
        while True:
            # Get the work from the queue and expand the tuple
            directory, link = self.queue.get()
            try:
                download_link(link)
            finally:
                self.queue.task_done()

* The main application/thread will create a queue and threads:
from queue import Queue
queue = Queue()
# Create 8 worker threads
for x in range(8):
    worker = DownloadWorker(queue)
    # Setting daemon to True will let the main thread exit even though the workers are blocking
    worker.daemon = True
    worker.start()
# Put the tasks into the queue
queue.put(link)
# Causes the main thread to wait for the queue to finish processing all the tasks
queue.join()

Running this Python threading example script on the same machine used earlier results in a download time of 4.1 seconds! That’s 4.7 times faster than the previous example.

To use multiple processes, we create a multi-processing Pool. With the map method it provides, we will pass the list of URLs to the pool, which in turn will spawn eight new processes and use each one to download the images in parallel. This is true parallelism, but it comes with a cost. The entire memory of the script is copied into each sub-process that is spawned.
You can use Pool's like RabbitMQ with Celery or Redis.
You can spawn multiple worker application(like main application) that read from Pool that create threads to download the images. 

RabbitMQ is an open source message broker software that originally implemented the Advanced Message Queuing Protocol and has since been extended with a plug-in architecture to support Streaming Text Oriented Messaging Protocol, Message Queuing Telemetry Transport, and other protocols
Redis, RE-dis is an open-source in-memory data structure project implementing a distributed, in-memory key-value database with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, hyperloglogs, bitmaps and spatial indexes.

- Concurrentcy with Django Models

Install django-concurrency. It is an optimistic locking library for Django Models
It prevents users from doing concurrent editing in Django both from UI and from a Django command.
How it works? django-concurrency works adding a concurrency.fields.VersionField to each model, each time a record is saved the version number changes (the algorithm used depends on the implementation of concurrency.fields.VersionField used.
This is very useful when using REST API in Django and will prevent incorrect database updates.
(quora.com & toptal.com & django-concurrency.readthedocs.io)

What is concurrency in databases?

Concurrency can be defined as the ability for multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can execute without blocking each other, the greater the concurrency of the database system.
(technet.microsoft.com)

What is concurrency control in database?

Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system.
(databasemanagement.wikia.com)

- How Oracle support concurrency?

The Concurrency Oracle metric allows simultaneous access of the same data by many users. A multi-user database management system must provide adequate concurrency controls, so that data cannot be updated or changed improperly, compromising data integrity. Therefore, control of data concurrency and data consistency is vital in a multi-user database. Oracle maintains data consistency in a multi-user environment by using a multi-version consistency model and various types of locks and transactions. Oracle uses the information maintained in its rollback segments to provide these consistent views. The rollback segments contain the old values of data that have been changed by uncommitted or recently committed transactions.

* Data concurrency means that many users can access data at the same time.
* Data consistency means that each user sees a consistent view of the data, including visible changes made by the user's own transactions and transactions of other users.
Multiversion Concurrency Control means Oracle automatically provides read consistency to a query so that all the data that the query sees comes from a single point in time (statement-level read consistency). Oracle can also provide read consistency to all of the queries in a transaction (transaction-level read consistency)

- How mongoDB support concurrency?

MongoDB allows multiple clients to read and write the same data. In order to ensure consistency, it uses locking and other concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously. Together, these mechanisms guarantee that all writes to a single document occur either in full or not at all and that clients never see an inconsistent view of the data.

MongoDB uses multi-granularity locking that allows operations to lock at the global, database or collection level, and allows for individual storage engines to implement their own concurrency control below the collection level and uses reader-writer locks that allow concurrent readers shared access to a resource, such as a database or collection, but in MMAPv1, give exclusive access to a single write operation, but for most read and write operations, WiredTiger uses optimistic concurrency control.

Concurrency control ensures that database operations can be executed concurrently without compromising correctness. Pessimistic concurrency control, such as used in systems with locks, will block any potentially conflicting operations even if they may not turn out to actually conflict. Optimistic concurrency control, the approach used by WiredTiger, will delay checking until after a conflict may have occurred, aborting and retrying one of the operations involved in any write conflict that arises
WiredTiger (storage engine) uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
* MMAPv1 (storage engine) uses collection-level locking as of the 3.0 release series, an improvement on earlier versions in which the database lock was the finest-grain lock. Third-party storage engines may either use collection-level locking or implement their own finer-grained concurrency control.
(oracle.com & mongodb.com)


Popular posts from this blog

Big-O notation basics for web developers