Microservices Resources

DZone's Featured Microservices Resources

Microfrontends for Quarkus Microservices

By Nicolas Duminil CORE

Coined quite recently, the term microfrontend designates for a GUI (Graphical User Interface) what the one microservice designates for classical services, i.e., the decomposition process of the different application's parts and components. More importantly, it not only applies to GUIs in general but to a more specific category of GUIs named SPA (Single Page Application). This is important because if there existed several techniques aiming at separating the different parts and components of a web application in general when it comes to SPAs, the story would become a bit more difficult. As a matter of fact, separating the different parts and components of a general web application often means separating its different pages. This process becomes more tricky for SPAs, as it concerns the separation of the different visual fragments of the application's single page. This requires a finer granularity and a more intimate orchestration of the content elements. The microfrontend concept adds more complexity to the web applications development field, which is already fairly complex by itself. The SPA model, as well as the emergence of the so-called JavaScript or TypeScript-based web application platforms and frameworks, brought to the picture a high degree of intricacy, requiring developers to have a vast amount of background knowledge, from HTML and CSS to advanced aspects of Angular, React, Node, Vue, and jQuery. In the Java world, a new category of software developers has come to light: the full-stack developers who not only need to deal with the grief of mastering Java, be it standard or enterprise, and all its underlying sub-technologies like Servlet, REST, CDI, JPA, JMS and many others, currently placed under the auspices of Jakarta EE, but who, increasingly, is required to master things like WebPack, SystemJS, Bower, Gulp and others Yeoman. Not to mention any more Spring, Quarkus, Micronaut, or Helidon. In former times, when dinosaurs still populated the Earth, the enterprise-grade Java applications development only required the knowledge of a single technology: Java with possibly its enterprise extensions, appointed successively as J2EE, Java EE, and finally Jakarta EE. Unless it was Spring, the applications and services were deployed on Jakarta EE-compliant application servers, like Glassfish, Payara, Wildfly, JBoss, WebLogic, WebSphere, etc. These application servers were providing out-of-the-box all the required implementations of the above-mentioned specifications. Among these specifications, Jakarta Faces (formerly called JSF: Java Server Faces) was meant to offer a framework that facilitates and standardizes the development of web applications in Java. The Jakarta Faces history goes back to 2001 to its initial JSR (Java Specifications Request) 127. At that time, another web framework, known under the name of Struts and available under an Apache open-source license, was widely popular. As it sometimes happens in the web frameworks space, the advent of Jakarta Faces was perceived by the Apache community as being in conflict with Struts and, in order to resolve this alleged conflict, a long and heavy negotiation process of several years between Sun Microsystems and the Apache community, was required. Finally, Sun agreed to lift the restrictions preventing JSRs from being independently implemented under an open-source license, and the first implementation, named RI (Reference Implementation), was provided in 2003. Jakarta Faces was generally well received despite a market crowded with competitors. Its RI was followed by other implementations over the years, starting with Apache MyFaces in early 2004 and continuing with RedHat RichFaces in 2005, PrimeTek PrimeFaces in 2008, ICEsoft ICEfaces and Oracle ADF Faces in 2009, OmniFaces in 2012, etc. The specifications have evolved as well, from the 1.0 released in 2001 to the 4.0 released in 2022. Hence, more than 20 years of history in order to advent to the last Jakarta Faces release 4.0, a part of the Jakarta EE 10 specifications, named Mojara. The software history is sometimes convoluted. In 2010, Oracle acquired Sun Microsystems and became the owner of the Java trademark. All along the time period that they were under the Oracle stewardship, the Java EE specifications were in a kind of status quo before becoming Eclipse Jakarta EE. The company didn't really manage to set up a dialogue with users, communities, work groups, and all those involved in the recognition and promotion of the Java enterprise-grade services. Their evolution requests and expectations were ignored by the editor, who didn't know how to deal with their new responsibility as the Java/Jakarta EE owner. In such a way that little by little, this has led to a guarded reaction from software architects and developers, who began to prefer and adopt alternative technological solutions to application servers. While trying to find alternative solutions to Jakarta EE and to remedy issues like the apparent heaviness and the expansive prices of application servers, many software professionals have adopted Spring Boot as a development platform. And since they needed Jakarta EE implementations for even basic web applications, they deployed these applications in open-source servlet engines like Tomcat, Jetty, or Undertow. For more advanced features than just servlets, like JPA or JMS, Spring Boot provides integration with Active MQ or Hibernate. And should more advanced features be required, like JTA, for example, these software professionals were going fishing on the internet for free third-party implementations like Atomikos and, in the absence of an official integration, they tried to integrate by these features on their servlet engine, with all the risks that this entails. Other solutions, closer to real Jakarta EE alternatives, have emerged as well and, among them, Netty, Quarkus, Micronaut are the best-known and most popular. All these solutions were based on a couple of software design principles, like single concern, discrete boundaries, transportability across runtimes, auto-discovery, etc., which were known since the dawn of time. But because the software industry continuously needs new names, the new name that has been found for these alternative solutions is "microservices." More and more microservice architecture-based applications have appeared during the next few years, to such an extent that the word "microservice" became one of the most common buzzwords in the software industry. In order to optimize and standardize the microservices technology, the Eclipse Foundation decided to apply to microservices the same process that was used in order to design the Jakarta EE specifications. The Eclipse MicroProfile was born. But all these convolutions have definitely impacted the web framework technologies. While the high majority of the Java enterprise-grade applications were using Jakarta Faces for their web tier, switching from a software architecture based on Jakarta EE-compliant application servers to microservices resulted in a phasing-out of these architectures in favor of some more lightweight ones, often based on Eclipse Microprofile specifications. And since Jakarta Faces components needed an application server to be deployed on, other lighter alternatives, based on JavaScript or TypeScript libraries, like Angular, Vue, ExtJS, jQuery, and others, have been adopted to make up for its absence. Nowadays, most Java enterprise applications adopt the software architecture depicted below: While these microservices might be implemented using different frameworks like Spring Boot, the most natural choice is probably Quarkus. As a matter of fact, Quarkus is one of the most attractive Eclipse Microprofile implementations, not only thanks to its high degree of compliance with the specifications but also due to its extensions and its capacity to generate native code, which makes it the Supersonic and the Subatomic Java framework. As for the front end, it typically might be implemented in Angular. In order to achieve such an implementation, two development teams are generally required: A Frontend team specialized in TypeScript, Angular, CSS, and HTML development, using Node.js as a deployment platform, NPM as a build tool, Bower as a dependency management, Gulp as a streaming system, Karma and Jasmine for testing, WebPack as a code bundler, and probably many others. A Backend team specialized in Java development using the Eclipse Microprofile specifications, as well as different Jakarta EE implementations of sub-technologies like Jakarta REST, Jakarta Persistence, Jakarta Messaging, Jakarta Security, Jakarta JSON Binding, etc. A single team of fullstack developers covering all the above-mentioned fields and technologies might also do it, but this is less usual. In any case, as you can observe, it becomes quite difficult to build a Java enterprise-grade project team as it requires at least two categories of profiles, and, given this technology's complexity, the mentioned profiles should better be senior. This situation sharply contrasts with what happened in the former times when the Frontend could have been implemented using Jakarta Faces and, hence, a single Java development team was able to take charge of such an enterprise-grade project. Jakarta Faces is a great web framework whose implementations offer hundreds of ready-to-use widgets and other visual controls. Compared with Angular, where the visual components are a part of external libraries, like Material, NG-Bootstrap, Clarity, Kendo, Nebular, and many others, Jakarta Faces implementations not only provide ways more widgets and features but also are part of the official JSR 372 specifications and, in this respect, they are standard, as opposed to the mentioned libraries, which evolve with their authors prevailing moods, without any guarantee of consistency and stability. One of the criteria that has formed the basis of the decision of many organizations to switch from Jakarta Faces web applications to JavaScript/TypeScript frameworks was client-side rendering. It was considered that the server-side rendering, which is the way the Jakarta Faces works, is less performant than the client-side rendering provided by the browser-based applications. This argument has to be taken with a grain of salt: Client-side rendering means rendering pages directly in the browser with JavaScript. All logic, data fetching, templating, and routing are handled by the client. The primary downside of this rendering type is that the amount of JavaScript required tends to grow as an application grows, which can have negative effects on a page's capacity to consistently respond to user inputs. This becomes especially difficult with the addition of new JavaScript libraries, polyfills, and third-party code, which compete for processing power and must often be processed before a page's content can be rendered. Server-side rendering generates the full HTML for a page on the server in response to navigation. This avoids additional round-trips for data fetching and templating on the client since it's handled before the browser gets a response. Server-side rendering generally reduces the time required for the page content to become visible. It makes it possible to avoid sending lots of JavaScript to the client. This helps to reduce a page's TBT (Total Blocking Time), which can also lead to a lower average response time as the main thread is not blocked as often during page load. When the main thread is blocked less often, user interactions will have more opportunities to run sooner. With server-side rendering, users are less likely to be left waiting for CPU-bound JavaScript to run before they can access a page. Accordingly, the argument consisting of saying that the server-side rendering is bad while the client-side one would be better is just a myth. However, there is one potential trade-off here: generating pages on the server might take time, which may result in a higher TTFB (Time to First Byte). This is the time between the user's click instant and the one when the first content byte comes in. And admitting that this metric impacts more important others, like requests per second or latency and uptime, it's difficult to assert that the web application's average response time is really affected in a user-sensible way. Consequently, it appears clearly from this analysis that developing Java web applications using server-side rendering frameworks, like Jakarta Faces, not only leads to less performant applications, but it's also much simpler and less expansive. This approach doesn't require so many different technology stacks as its JavaScript/TypeScript-based alternatives. The development teams don't need several categories of profiles, and the same developer can directly contribute to both the front end and the back end without having to operate any paradigm switch. This last argument is all the more important as Java developers, concerned by things like multi-threading, transaction management, security, etc., aren't comfortable when it comes to command programming languages that have been designed to run in a browser. So the good news here is that if, like me, you're nostalgic for Jakarta Faces, for now on, you can start implementing your Frontends with it without the need for any Jakarta EE-compliant application server. That's because Quarkus, our famous Supersonic Subatomic Java platform, provides a Jakarta Faces extension, allowing you to write beautiful Frontends like in the old good times. At Melloware Inc., they provide a PrimeFaces extension for Quarkus, as described here. You'll find in the mentioned GIT repository a showcase application that demonstrates, with consistent code examples, how to use every single PrimeFaces widget. Please follow the guide in the README.md file to build and run the showcase on both an application server, like Wildfly, and in Quarkus. You'll tell me what it feels like there! Now, to come back to the microfrontend notion, which was our main concern at the beginning of this post, Michael Geers has written a well-documented article, as well as a book, in which he exemplifies the most modern trends to build rich and powerful SPAs. But far from really demystifying the concept, these works show how complex the microfrontend topic is by offering us an extensive journey in a new world populated by strange creatures like Self Contained Systems (SCS), Verticalized Systems, or Documents to Applications Continuum. Far from pretending to be able to clarify how all these new paradigms come into the overall landscape of web application development, if I'd have to resume in a single statement what the microfrontends essentially is, I'd define them by quoting Michael: A composition of features which are owned by independent teams. Each team has a distinct area of business or mission it cares about and specializes in. A team is cross functional and develops its features end-to-end, from database to user interface. The figure below tries to illustrate this concept. After reading this definition, I can't refrain from thinking that it fits so well to the Jakarta Faces Custom Components concept, which, as its name implies, lets you create brand new custom visual components that you can plug into your applications that different independent teams can own and specializes into, etc. As luck would have it! :-). More

AI-Driven Microservice Automation

By Tyler Band

I have been using a new open-source platform, API Logic Server (an open-source project on GitHub) to deliver API microservices for a client. I wanted to build a complete mobile banking API from start to finish based on the old TPC benchmark. This includes declarative business logic (a.k.a. spreadsheet-like rules), security, react-admin UI, and an Open API (Swagger) documentation. API Logic Server (ALS) creates executable projects that you can extend in your IDE. It is an open-source Python platform based on SQLAlchemy 2.0, Flask, safrs-JSON API, react-admin, and LogicBank (a declarative spreadsheet-like rules engine). ChatGPT-SQL Model I started by asking ChatGPT to "generate a banking DDL based on the old TPC benchmark for MySQL". Here is the DDL that was generated: While ChatGPT gave me a usable DDL, I then asked ChatGPT to "add columns for deposits, withdrawals, and image to Transaction" to allow the declarative rules to do the heavy lifting. Transaction Deposit DECIMAL(15,2) Withdrawal DECIMAL(15,2) Image (checks and withdrawal slips) TEXT Authentication Customer UseName VARCHAR(64) Customer Password (hash) VARCHAR(64) API Logic Server (ALS) This is a full-featured open-source Python platform (like Django) to create a complete runtime API and a react-admin back-office UI solution. The command line feature of ALS made the creation of a running server with a multi-page react-admin U and an Open API a snap. The command line feature will read the SQL "banking" schema, and create a customizable project with all the wiring needed for SQLAlchemy and a full Restful JSON:API. ALS uses a command line approach to connect to the database and create all the running components: Shell ApiLogicServer create --project_name=tpc --db_url=mysql+pymysql://root:p@localhost:3306/banking This creates a project you can open in your IDE to run and customize. Rules Declarative rules are spreadsheet-like expressions that automate backend multi-table derivations and constraints. Rules automatically execute as part of your API, making it a service. They dramatically reduce the amount of code you'd expect to write manually. Rules are entered in your IDE. They are extensible with Python and are debugged with the IDE debugger. Rules provide automatic re-use for our various TPC use cases - handling deposits and withdrawals, maintaining account balances, preventing overdrafts, and processing balance transfers. The magic is the LogicBank (an open-source spreadsheet-like engine on GitHub) that handles the ordering and execution at runtime and integrates directly with SQLAlchemy. We start the process by writing our logic in a business user-friendly way. Derive Account balance is the sum of Transaction.TotalAmount Constraint: Account.AcctBalance cannot be less than zero Constraint: Transaction.Deposits and Transaction. Withdrawals must be greater than zero Formula - Transaction.TotalAmount is Deposit less withdrawal Customers can only transfer between their own accounts Adding Rules I am using VSCode. The command-line generated code is broken up into folders like database, api, logic, security, devops, etc. Under logic/declare_logic.py, we convert our design rules into actual declarative rules. Code completion makes this a breeze. Python plus your IDE provides a Domain Specific Language for business logic. Observe that rules are simply a formalization of our design above - an executable design. Python Rule.sum(derive=models.Account.AcctBalance, as_sum_of=models.Transaction.TotalAmount) Rule.constraint(validate=models.Account, as_condition=lambda row: row.AcctBalance >= 0, error_msg="Account balance {row.AcctBalance} cannot be less than zero") Rule.formula(derive=models.Transaction.TotalAmount, as_expression=lambda row: row.Deposit - row.Withdrawal) Automated React-Admin UI ALS created a react-admin back office multi-table application for all the tables in the model. This allowed me to add a customer, checking and savings account, sample deposit transactions, test rules (e.g. sums, constraints, formula, etc.), and transfer funds. React-Admin Back Office UI OpenAPI (Swagger) ALS will also generate OpenAPI (Swagger) documentation. This is using the safrs JSON-API, which enables clients to specify child tables and columns to return (a self-service API, much like GraphQL). This API will allow the front-end developer the ability to show the customer information, all their accounts, and a list of transactions (deposits and withdrawals) in a single API request. Another nice feature is each row returns a checksum which is used to support optimistic locking. Open API (Swagger) Transfer Funds The heart of the TPC benchmark was moving funds between 2 accounts in a single transaction. In this example, we let the rules do the formulas, derivations. validations, and constraints, but we need an API to POST the JSON. One approach is using the api/custom_api.py to build a Python class to transfer funds from one account to another. Another approach is to ask ChatGPT to "generate the transfer funds SQLAlchemy code" - so I added a new Transfer table and a commit event rule to implement the change to do the same work. Rules automatically adjust the firing order (formula, sums, validations, and then the commit row event). The code below is entered in your IDE, providing code completion, debugging, etc. Python def fn_transfer_funds(row=models.Transfer, old_row=models.Transfer, logic_row=LogicRow): if logic_row.isInsert(): fromAcctId = row.FromAccountID toAcctId = row.ToAccountID amount = row.Amount from_trans = models.Transaction() from_trans.TransactionID = nextTransId() from_trans.AccountID = fromAcctId from_trans.Withdrawl = amount from_trans.TransactionType = "Transfer From" from_trans.TransactionDate = date.today() session.add(from_trans) to_trans = models.Transaction() to_trans.TransactionID = nextTransId() to_trans.AccountID = toAcctId to_trans.Deposit = amount to_trans.TransactionType = "Transfer To" to_trans.TransactionDate = date.today() session.add(to_trans) print("Funds transferred successfully!") Rule.commit_row_event(on_class=models.Transfer, calling=fn_transfer_funds Security (Authentication/Authorization) Since this is a multi-tenant model, the declarative security model needs roles and filters that authorize different role players to specific CRUD tasks. The role-based access control requires a separate data model for login, roles, and user roles. We also will need an authentication process to validate users to log in to the mobile banking system. ALS asks that you initialize the security model using the command line tool (ApiLogicServer add-auth) which creates all the necessary components. Python DefaultRolePermission(to_role=Roles.customer, can_read=True, can_update=True, can_insert=True, can_dellete=False) Grant( on_entity = models.Customer, to_role = Roles.customer, can_delete=False, filter = lambda : models.Customer.CustomerID == Security.current_user().CustomerID) Docker Container The devops/docker folder has Shell scripts to build and deploy a running Docker image that can be deployed to the cloud in a few clicks. Just modify the docker-compose properties for your database and security settings. Summary I was impressed with API Logic Server's ability to create all the running API components from a single command line request. Using ChatGPT to get started and even iterate over the dev lifecycle was seamless. The front-end developers can begin writing the login (auth) and use the API calls from the Open API (Swagger) report while the final logic and security are being instrumented. Business users can run the screens, and collaborate to ensure the real requirements are identified and met. While I am new to the Python language, this felt more like a DSL (domain-specific language) with code completion and a well-organized code space. The ALS documentation provides great help and tutorials to understand how to deliver your own logic and API. The GitHub source can be found here. More

Auto-Scaling DynamoDB Streams Applications on Kubernetes

By Abhishek Gupta CORE

Choreography Pattern: Optimizing Communication in Distributed Systems

By Gaurav Gaur CORE

AWS ECS vs. Kubernetes: The Complete Guide

By Chase Bolt

Selenium Versus Karate: A Concrete Comparative Approach

In the world of software architecture, which is still in its infancy in absolute terms, change is still rapid and structural. Deep paradigms can be overturned, changing the structure of information systems. As coupling has become a key issue for many information systems, new architectures have emerged, notably SOA (Service Oriented Architecture), followed by microservices. These two fairly widespread architectures certainly make it possible to solve coupling problems, but as always there are certain trade-offs to be made. The complexity of testing is one of them. Like a balanced equation, an information system is an equivalence between a technical expression and a functional expression, and if one changes, we need to be able to guarantee the validity of the equivalence between the two. To attain this, we need to be able to test all parties using tools that can establish this equality. When releasing a new version of a microservice, it’s fairly easy to run unit and even integration tests before and during deployment to validate its internal operation. But how do you guarantee that its integration with the other microservices will always be valid? How can you be sure that a system made up of a constellation of microservices will behave as expected if you only run tests on each component? How can we guarantee that the new technical expression of our equation will always give the same functional expression? By testing it. It can be manual, of course, but it can also be automated*. Let’s take a look at this automation, using two technologies we’ve used, Selenium and Karate. The aim of this study is not to make a theoretical comparison, of which there are so many, but a concrete one. If a developer today wants to use behavior-driven development, what will he have to do with one of these options? The study will first provide a quick analysis of the functionalities offered by both frameworks. We will then delve into the technical aspects, using a specific use case with a focus on Programming and CI/CD. Finally, we will examine the communities surrounding both frameworks. Selenium will not be studied on its own; in order to compare a level of functionality equivalent to that of Karate, it will be used with Cucumber. This will make it possible to test technical packages that allow automatic tests to be written in natural language, thus satisfying a BDD requirement. In our case, we will opt for the Java version of Selenium, although other alternatives do exist. Features Selenium/Cucumber Selenium Selenium IDE: Enables recording of actions performed on a browser, thisFirefox plug-in saves recorded scenarios as ”side” files for future use. Selenium WebDriver: This is a toolkit for interacting with different web browsers using the Gecko and ChromeDriver drivers. We’ll be using this toolkit if we opt for Selenium. It’s available in several languages, including Java, JavaScript, and Python. Selenium Grid: Enables WebDriver scripts to be executed on remote (or real) machines by sending commands from the client to remote browser instances. The aim is to provide a straightforward method of running tests concurrently on multiple machines. Cucumber Cucumber is an open-source tool for behaviour-based testing BDD. It describes expected software behavior in a natural language that can be understood by both technical and non-technical stakeholders. This language is referred to as Gherkinand is used to explain functionalities in a clear and structured manner. Each test can be automated through code (automated behavior using Selenium). This program is known as glue code and can be written in various languages such as Java, Csharp, and Ruby, among others. However, adhering to the specifics outlined in the introduction, we will focus solely on the Java implementation. It can also produce comprehensive execution reports to facilitate the reading of test execution outcomes. Karate This framework was originally based on Cucumber until its release 0.8.0, when it was separated from it. This decision proved to be beneficial. Nevertheless, it still uses Gherkin expressions for improved clarity, readability, and test organization similar to Cucumber. API test automation: Karate’s initial foundation is the creation of API tests from Gherkin files. Other features have been subsequently integrated to enhance its capabilities. It is a direct competitor to REST Assured. Mocks: This section facilitates the generation of API mocks, which are highly advantageous in microservice scenarios or for separating front-end and back-end teams. Performance testing: Based on API testing, Karate incorporates Gatling to avoid having to rewrite user flows, by applying performance testing to pre-existing API tests. UI automation: Finally, Karate provides UI tests that automate user behavior by interacting with the DOM. These tests are written in Karate DSL, based on the Gherkin language. Programming Use Case Description Open a Google browser page. Search for Martin Fowler. Click on the first result that contains Martin Fowler. Check that you are on ”https://martinfowler.com/”. Selenium/Cucumber Here, we will work in three stages. Write the Gherkin Cucumber scenarios that describe the test cases. Create the Glue Code to link the previous scenario steps to the code using the Cucumber framework. Use the Selenium library to interact with the browser and write any necessary utility functions. Gherkin: Plain Text Feature: Demonstration use case Scenario: search for Martin Fowler website Given I navigate to ’https://google.com’ And I search ’Martin Fowler’ in google search bar When I click on result containing ’Martin Fowler’ Then the current url is ’https://martinfowler.com/’ Glue Code: Java @When("ˆI navigate to \"([ˆ\"]*)\$") public void navigate_to_url(final String urlToTest) { navigateToUrl(urlToTest); } @When("ˆI search \"([ˆ\"]*)\ in google search bar$") public void search_data_in_google(final String searchText) { WebElement element = getElementByName("q"); fillElementWithText(element, searchText); clickElementByContain("Recherche Google") } @When("ˆI click on result containing \"([ˆ\"]*)\$") public void click_on_first_result(final String searchText) { clickElementByContain(searchText); } @When("ˆthe current url is \"([ˆ\"]*)\$") public void current_url_test(final String urlToTest) { checkCurrentUrl(urlToTest); } Then we will utilize the Selenium library. The Selenium Toolbox includes this.config.getDriver(), allowing access to functions like navigate() or findElement(…). Java public void navigateToUrl(final String url) { this.config.getDriver().navigate().to(url); } public WebElement getElementsByName(final String name) { return this.config.getDriver() .findElement(By.name(name)); } public void checkCurrentUrl(final String urlToCheck) { assertEquals(this.config.getDriver() .getCurrentUrl(), urlT); } public void clickElementByContain(final String contain) { WebElement element = this.config.getDriver().findElement( By.xpath(String.format("//[contains(text(),'%s')]", contain); element.click(); } Karate With Karate, tasks are much faster: all you need is a scenario file using the Karate DSL (Domain-specific language) to achieve the same desired outcome. Plain Text Feature: Demonstration use case Scenario: search for Martin Fowler website Given driver ’https://google.com’ And input(’input[name=q]’, Martin Fowler) And click(’{ˆ}Recherche Google’) When waitForText(’Martin Fowler’,’APPEARED’).click(’{ˆh3}MartinFowler’) Then match driver.url ’https://martinfowler.com/’ Analysis Here we can see a distinct difference in the amount of code required. Karate has already integrated the DOM interaction functions down to the Gherkin language level, which is a significant advantage in terms of development speed. However, this may affect the readability of the scenario file, particularly in a BDD context. As a result, it is reasonable to question whether BDD can be effectively executed using Karate. The answer may vary depending on the project’s context, its users, and the technical expertise of those involved. However, using Karate can greatly reduce maintenance expenses due to having less code and fewer bugs. This is a critical factor in the profitability of automated testing, which is contingent upon its simplicity, maintainability, and durability. CI/CD, Performance and Scalability In both cases, we presume that we will be using the followingprocess: Basic automation process The issue of data is not relevant in our case. Although it is an important factor when discussing test automation, both Selenium and Karate encounter the same problem and it is unrelated to their core functionality. So, our main focus will be on how both technologies can be integrated into a CI/CD environment. Selenium Here we will explore the use of Selenium Grid to compare the full range of features offered by Selenium. Required Components Selenium Grid Hub: The central control point of the Selenium Grid architecture which manages the distribution of test execution to different nodes (machines or virtual environments). The hub receives test requests from test scripts and routes them to available nodes according to desired capabilities, such as browser, platform, version, etc. Nodes: They are individual machines or virtual environments that are responsible for executing the tests. Each node registers with the hub and advertises its capabilities, including supported browsers and operating systems. Test scripts connect to the hub, which in turn redirects them to appropriate nodes for execution based on the desired capabilities. WebDriver Instances: WebDriver instances are indispensable for interacting with browsers and automating UI tests. The Remote WebDriver instance is used in the test script to send commands to browsers running on the nodes. These instances act as a bridge between the test script and the browser, enabling actions like clicking, inputting text, and validating content. The architecture of the aforementioned components is as follows: Selenium Grid components architecture Another option is to use Selenoid, an open-source project that offers a lightweight and efficient method for implementing Selenium Grid through Docker containers. It simplifies the process of running Selenium tests across various browsers and versions. Selenoid brings containerization benefits to Selenium Grid, which facilitates the handling of test execution environments and reduces resource overhead. Selenoid also offers built-in video recordings of test sessions. This is especially helpful for diagnosing test failures, as you can watch the video to comprehend the failure context. Selenium tests execution on Selenoid The key distinction lies in the technology employed. Selenoid utilizes Docker containers to achieve browser isolation, whereas Selenium Grid relies on separate nodes with Remote WebDriver instances. The objective of both approaches is to furnish uniform and replicable browser environments for test execution, alleviating problems that may crop up due to shared browser instances. In summary, both Selenium Grid and Selenoid utilize specialized browser instances for every test session to guarantee a stable and separate testing environment. Though approaches may vary, the fundamental principle of browser isolation persists. Karate For Karate things are much more simple. Two Docker images are available and should be deployed on the CI server in order to emulate the browser. Then you can deposit your Karate scenarios and launch them in different manners: Using a standalone version of Karate (In that case youwill prefer to use this Docker image) Using a Java jar containing the Karate library Karate CI/CD architecture with standalone Karate jar It is important to note that Karate enables native multithreading. Instead of using multiple browser instances to run tests, tests can be executed concurrently by adding a custom parameter. The figure below shows a multi-threading inside a container with three threads. Karate multi-threading Communities and Usage It is noteworthy that when it comes to e2e, Selenium is the leading framework and enjoys widespread adoption in the community. Therefore, we will commence by conducting a comparative analysis, followed by a closer examination of the activity surrounding Karate. Comparative Analysis GitHub Stars This initial metric measures the amount of "stars" granted to various repositories by the GitHub community. However, this criterion alone is not conclusive as bots may artificially inflate the value. As a result, we utilized the Astronomer tool, which provides a confidence score for Github repositories based on the subsequent criteria: The average amount of lifetime contributions among stargazers The average amount of private contributions The average amount of public-created issues The average amount of public authored commits The average amount of publicly opened pull requests The average amount of public code reviews The average weighted contribution score (weighted by making older contributions more trustworthy) Every 5th percentile, from 5 to 95, of the weighted contribution score The average account age, older is more trustworthy Analysis of the Intuit/Karate repository with Astronomer The achieved grade of "A" confirms the quality of the information analyzed within the repository. Therefore, we consider the ”stars” criterion as reliable. The figure below displays Cypress and Cucumber for additional comparison points besides the two examined frameworks. The y-axis represents the number of Github stars, and the x-axis shows the date. Expectedly, Selenium surpasses its rival. However, it is worth mentioning that Karate has gained significant ground and even outperformed Cucumber, which is a highly prevalent framework utilized for BDD development with Gherkin. Cypress remains popular, particularly within the JavaScript community, due to its significant reputation. Comparison of the number of GitHub Stars — Star history StackOverflow Trends We will now examine the "trends" criterion on Stack Overflow to gauge the activity of the community involved in a particular technology. By correlating the number of users with the corresponding tag on Stack Overflow, we can assess the level of support available for the technology, as the site is extensively used by the developer community. This ensures varying levels of support (courtesy of the community, given that these are open-source projects). The greater the frequency of occurrences, the simpler it is to discover solutions to specific problems. The initial graph examines the following technologies: Selenium, Cucumber, Cypress, and Karate. The y-axis presents the proportion of questions posted on Stack Overflow that contain the corresponding tag, while the x-axis displays the months/years. Stack Overflow trends - including Selenium Once again, Selenium is in the lead, confirming the previous result. To improve our analysis, we will display the same graph without Selenium to avoid compressing the curves (the drop in the Selenium curve is due to the fact that it has been moved by StackExchange to another website dedicated to software quality). StackOverflow trends — without Selenium Karate has a high percentage, experiencing a significant rise since its inception. Cucumber has remained stable, closely trailing Karate. Cypress is still on top but seems to know an important decrease. A correlation can be established between the acceleration depicted on the ” GitHub Stars” chart and the level of occurrence here. Conclusion We note that Karate is a more code-efficient framework, enabling simple writing due to its design to avoid Selenium’s complexity. Its CI/CD capabilities are powerful for most projects. However, Selenium Grid still offers specific features that Karate does not for certain integrations. The established and strong community around Selenium is a valuable aspect, as is the variety of supported programming languages. On the other hand, Karate only offers one ”language” — its own DSL. Whilst this is quite easy to learn and intuitive for programmers, it can still be a bit complicated for non-technical users, especially in a BDD context. The community around this framework is growing, and many improvements have been made since its inception. Peter Thomas is very responsive on Stack Overflow and his own GitHub, answering questions promptly, and the extensive documentation is clear and exhaustive. However, the project is still very closely linked to Peter Thomas for the moment. Also, it’s crucial to bear in mind that we’re solely referring to Karate-UI and that this framework provides several other functionalities, such as API testing and performance testing using Gatling based on these API tests, which is highly engaging. Karate is a contemporary and intriguing testing framework that presents a viable option to consider for your project, depending on its specific characteristics.

By Pier-Jean MALANDRINO

Comprehensive Guide to Microservices Testing: Ensuring Reliable and Scalable Software

Microservices architecture has become extremely popular in recent years because it allows for the creation of complex applications as a collection of discrete, independent services. Comprehensive testing, however, is essential to guarantee the reliability and scalability of the software due to the microservices’ increased complexity and distributed nature. Due to its capacity to improve scalability, flexibility, and resilience in complex software systems, microservices architecture has experienced a significant increase in popularity in recent years. The distributed nature of microservices, however, presents special difficulties for testing and quality control. In this thorough guide, we’ll delve into the world of microservices testing and examine its significance, methodologies, and best practices to guarantee the smooth operation of these interconnected parts. Understanding Microservices The functionality of an application is provided by a collection of independent, loosely coupled microservices. Each microservice runs independently, has a database, and uses its business logic. This architecture supports continuous delivery, scalability, and flexibility. In order to build a strong foundation, we must first understand the fundamentals of microservices architecture. Microservices are teeny, independent services that join forces to create a full software program. Each service carries out a particular business function and communicates with other services using clear APIs. Organizations can more effectively develop, deploy, and scale applications using this modular approach. However, with the increase in services, thorough testing is essential to find and fix any potential problems. Challenges in Microservices Testing Testing microservices introduces several unique challenges, including: Distributed nature: Microservices are distributed across different servers, networks, and even geographical locations. This requires testing to account for network latency, service discovery, and inter-service communication. Dependency management: Microservices often rely on external dependencies such as databases, third-party APIs, and message queues. Testing must consider these dependencies and ensure their availability during testing. Data consistency: Maintaining data consistency across multiple microservices is a critical challenge. Changes made in one service should not negatively impact the functionality of other services. Deployment complexity: Microservices are typically deployed independently, and coordinating testing across multiple services can be challenging. Versioning, rollbacks, and compatibility testing become vital considerations. Integration testing: Microservices architecture demands extensive integration testing to ensure seamless communication and proper behavior among services. Importance of Microservices Testing Microservices testing plays a vital role in guaranteeing the overall quality, reliability, and performance of the system. The following points highlight its significance: Isolation and Independence: Testing each microservice individually ensures that any issues or bugs within a specific service can be isolated, minimizing the impact on other services. Continuous Integration and Delivery (CI/CD): Microservices heavily rely on CI/CD pipelines to enable frequent deployments. Effective testing enables faster feedback loops, ensuring that changes and updates can be delivered reliably without causing disruptions. Fault Isolation and Resilience: By testing the interactions between microservices, organizations can identify potential points of failure and design resilient strategies to handle failures gracefully. Scalability and Performance: Testing enables organizations to simulate high loads and stress scenarios to identify bottlenecks, optimize performance, and ensure that microservices can scale seamlessly. Types of Microservices Testing Microservices testing involves various types of testing to ensure the quality, functionality, and performance of individual microservices and the system as a whole. Here are some important types of testing commonly performed in microservices architecture: Unit Testing Unit testing focuses on testing individual microservices in isolation. It verifies the functionality of each microservice at a granular level, typically at the code level. Unit tests ensure that individual components or modules of microservices behave as expected and meet the defined requirements. Mocking frameworks are often used to isolate dependencies and simulate interactions for effective unit testing. Integration Testing Integration testing verifies the interaction and integration between multiple microservices. It ensures that microservices can communicate correctly and exchange data according to the defined contracts or APIs. Integration tests validate the interoperability and compatibility of microservices, identifying any issues related to data consistency, message passing, or service coordination. Contract Testing Contract testing validates the contracts or APIs exposed by microservices. It focuses on ensuring that the contracts between services are compatible and adhere to the agreed-upon specifications. Contract testing verifies the request and response formats, data structures, and behavior of the services involved. This type of testing is essential for maintaining the integrity and compatibility of microservices during development and evolution. End-to-End Testing End-to-end (E2E) testing evaluates the functionality and behavior of the entire system, including multiple interconnected microservices, databases, and external dependencies. It tests the complete flow of a user request through various microservices and validates the expected outcomes. E2E tests help identify issues related to data consistency, communication, error handling, and overall system behavior. Performance Testing Performance testing assesses the performance and scalability of microservices. It involves testing the system under different loads, stress conditions, or peak usage scenarios. Performance tests measure response times, throughput, resource utilization, and other performance metrics to identify bottlenecks, optimize performance, and ensure that the microservices can handle expected loads without degradation. Security Testing Security testing is crucial in microservices architecture due to the distributed nature and potential exposure of sensitive data. It involves assessing the security of microservices against various vulnerabilities, attacks, and unauthorized access. Security testing encompasses techniques such as penetration testing, vulnerability scanning, authentication, authorization, and data protection measures. Chaos Engineering Chaos engineering is a proactive testing approach where deliberate failures or disturbances are injected into the system to evaluate its resilience and fault tolerance. By simulating failures or stress scenarios, chaos engineering validates the system’s ability to handle failures, recover gracefully, and maintain overall stability. It helps identify weaknesses and ensures that microservices can handle unexpected conditions without causing a system-wide outage. Data Testing Data testing focuses on validating the accuracy, integrity, and consistency of data stored and processed by microservices. It involves verifying data transformations, data flows, data quality, and data integration between microservices and external systems. Data testing ensures that data is correctly processed, stored, and retrieved, minimizing the risk of data corruption or inconsistency. These are some of the key types of testing performed in microservices architecture. The selection and combination of testing types depend on the specific requirements, complexity, and characteristics of the microservices system being tested. A comprehensive testing strategy covering these types of testing helps ensure the reliability, functionality, and performance of microservices-based applications. Best Practices for Microservices Testing Microservices testing presents unique challenges due to the distributed nature of the architecture. To ensure comprehensive testing and maintain the quality and reliability of microservices, it’s essential to follow best practices. Here are some key best practices for microservices testing: Test at Different Levels Microservices testing should be performed at multiple levels, including unit testing, integration testing, contract testing, end-to-end testing, performance testing, and security testing. Each level of testing verifies specific aspects of the microservices and their interactions. Comprehensive testing at various levels helps uncover issues early and ensures the overall functionality and integrity of the system. Prioritize Test Isolation Microservices are designed to be independent and loosely coupled. It’s crucial to test each microservice in isolation to identify and resolve issues specific to that service without impacting other services. Isolating tests ensures that failures or changes in one microservice do not cascade to other parts of the system, enhancing fault tolerance and maintainability. Use Mocking and Service Virtualization Microservices often depend on external services or APIs. Mocking and service virtualization techniques allow for testing microservices independently of their dependencies. By replacing dependencies with mocks or virtualized versions of the services, you can control the behavior and responses during testing, making it easier to simulate different scenarios, ensure test repeatability, and avoid testing delays caused by external service availability. Implement Contract Testing Microservices rely on well-defined contracts or APIs for communication. Contract testing verifies the compatibility and compliance of these contracts between services. By testing contracts, you ensure that services can communicate effectively, preventing integration issues and reducing the risk of breaking changes. Contract testing tools like Pact or Spring Cloud Contract can assist in defining and validating contracts. Automate Testing Automation is crucial for effective microservices testing. Implementing a robust test automation framework and CI/CD pipeline allows for frequent and efficient testing throughout the development lifecycle. Automated testing enables faster feedback, reduces human error, and facilitates the continuous delivery of microservices. Tools like Cucumber, Postman, or JUnit can be leveraged for automated testing at different levels. Emphasize Performance Testing Scalability and performance are vital aspects of microservices architecture. Conduct performance testing to ensure that microservices can handle expected loads and perform optimally under various conditions. Load testing, stress testing, and performance profiling tools like Gatling, Apache JMeter, or Locust can help assess the system’s behavior, identify bottlenecks, and optimize performance. Implement Chaos Engineering Chaos engineering is a proactive testing methodology that involves intentionally injecting failures or disturbances into a microservices environment to evaluate its resilience. By simulating failures and stress scenarios, you can identify weaknesses, validate fault tolerance mechanisms, and improve the overall robustness and reliability of the system. Tools like Chaos Monkey, Gremlin, or Pumba can be employed for chaos engineering experiments. Include Security Testing Microservices often interact with sensitive data and external systems, making security testing crucial. Perform security testing to identify vulnerabilities, ensure data protection, and prevent unauthorized access. Techniques such as penetration testing, vulnerability scanning, and adherence to security best practices should be incorporated into the testing process to mitigate security risks effectively. Monitor and Analyze System Behavior Monitoring and observability are essential during microservices testing. Implement monitoring tools and techniques to gain insights into the behavior, performance, and health of microservices. Collect and analyze metrics, logs, and distributed traces to identify issues, debug problems, and optimize the system’s performance. Tools like Prometheus, Grafana, ELK stack, or distributed tracing systems aid in monitoring and analyzing microservices. Test Data Management Managing test data in microservices testing can be complex. Ensure proper test data management by using techniques like data virtualization or synthetic data generation. These approaches allow for realistic and consistent test scenarios, minimizing dependencies on production data and external systems. By following these best practices, organizations can establish a robust testing process for microservices, ensuring quality, reliability, and performance in distributed systems. Adapting these practices to specific project requirements, technologies, and organizational needs is important to achieve optimal results. Test Environment and Infrastructure Creating an effective test environment and infrastructure is crucial for successful microservices testing. A well-designed test environment ensures that the testing process is reliable and efficient and replicates the production environment as closely as possible. Here are some key considerations for setting up a robust microservices test environment and infrastructure: Containerization and Orchestration Containerization platforms like Docker and orchestration tools such as Kubernetes provide a flexible and scalable infrastructure for deploying and managing microservices. By containerizing microservices, you can encapsulate each service and its dependencies, ensuring consistent environments across testing and production. Container orchestration tools enable efficient deployment, scaling, and management of microservices, making it easier to replicate the production environment for testing purposes. Environment Configuration Management Maintaining consistent configurations across different testing environments is crucial. Configuration management tools like Ansible, Chef, or Puppet help automate the setup and configuration of test environments. They allow you to define and manage environment-specific configurations, such as database connections, service endpoints, and third-party integrations, ensuring consistency and reproducibility in testing. Test Data Management Microservices often interact with databases and external systems, making test data management complex. Proper test data management ensures that test scenarios are realistic and cover different data scenarios. Techniques such as data virtualization, where virtual test data is generated on the fly, or synthetic data generation, where realistic but non-sensitive data is created, can be employed. Additionally, tools like Flyway or Liquibase help manage database schema migrations during testing. Service Virtualization Service virtualization allows you to simulate or virtualize the behavior of dependent microservices that are not fully developed or available during testing. It helps decouple testing from external service dependencies, enabling continuous testing even when certain services are unavailable or undergoing changes. Tools like WireMock, Mountebank, or Hoverfly provide capabilities for creating virtualized versions of dependent services, allowing you to define custom responses and simulate various scenarios. Continuous Integration and Delivery (CI/CD) Pipeline A robust CI/CD pipeline is essential for continuous testing and seamless delivery of microservices. The CI/CD pipeline automates the build, testing, and deployment processes, ensuring that changes to microservices are thoroughly tested before being promoted to higher environments. Tools like Jenkins, GitLab CI/CD, or CircleCI enable the automation of test execution, test result reporting, and integration with version control systems and artifact repositories. Test Environment Provisioning Automated provisioning of test environments helps in reducing manual effort and ensures consistency across environments. Infrastructure-as-Code (IaC) tools like Terraform or AWS CloudFormation enable the provisioning and management of infrastructure resources, including virtual machines, containers, networking, and storage, in a programmatic and reproducible manner. This allows for quick and reliable setup of test environments with the desired configurations. Monitoring and Log Aggregation Monitoring and log aggregation are essential for gaining insights into the behavior and health of microservices during testing. Tools like Prometheus, Grafana, or ELK (Elasticsearch, Logstash, Kibana) stack can be used for collecting and analyzing metrics, logs, and traces. Monitoring helps identify performance bottlenecks, errors, and abnormal behavior, allowing you to optimize and debug microservices effectively. Test Environment Isolation Isolating test environments from production environments is crucial to prevent any unintended impact on the live system. Test environments should have separate infrastructure, networking, and data resources to ensure the integrity of production data. Techniques like containerization, virtualization, or cloud-based environments provide effective isolation and sandboxing of test environments. Scalability and Performance Testing Infrastructure Microservices architecture emphasizes scalability and performance. To validate these aspects, it is essential to have a dedicated infrastructure for load testing and performance testing. This infrastructure should include tools like Gatling, Apache JMeter, or Locust, which allow simulating high loads, measuring response times, and analyzing system behavior under stress conditions. By focusing on these considerations, organizations can establish a robust microservices test environment and infrastructure that closely mirrors the production environment. This ensures accurate testing, faster feedback cycles, and reliable software delivery while minimizing risks and ensuring the overall quality and reliability of microservices-based applications. Test Automation Tools and Frameworks Microservices testing can be significantly enhanced by utilizing various test automation tools and frameworks. These tools help streamline the testing process, improve efficiency, and ensure comprehensive test coverage. In this section, we will explore some popular microservices test automation tools and frameworks: Cucumber Cucumber is a widely used tool for behavior-driven development (BDD) testing. It enables collaboration between stakeholders, developers, and testers by using a plain-text format for test scenarios. With Cucumber, test scenarios are written in a Given-When-Then format, making it easier to understand and maintain test cases. It supports multiple programming languages and integrates well with other testing frameworks and tools. Postman Postman is a powerful API testing tool that allows developers and testers to create and automate tests for microservices APIs. It provides a user-friendly interface for sending HTTP requests, validating responses, and performing functional testing. Postman supports scripting and offers features like test assertions, test data management, and integration with CI/CD pipelines. Rest-Assured Rest-Assured is a Java-based testing framework specifically designed for testing RESTful APIs. It provides a rich set of methods and utilities to simplify API testing, including support for request and response specification, authentication, data validation, and response parsing. Rest-Assured integrates well with popular Java testing frameworks like JUnit and TestNG. WireMock WireMock is a flexible and easy-to-use tool for creating HTTP-based mock services. It allows you to simulate the behavior of external dependencies or unavailable services during testing. WireMock enables developers and testers to stub out dependencies, define custom responses, and verify requests made to the mock server. It supports features like request matching, response templating, and record/playback of requests. Pact Pact is a contract testing framework that focuses on ensuring compatibility and contract compliance between microservices. It enables teams to define and verify contracts, which are a set of expectations for the interactions between services. Pact supports various programming languages and allows for generating consumer-driven contracts that can be used for testing both the provider and consumer sides of microservices. Karate Karate is an open-source API testing framework that combines API testing, test data preparation, and assertions in a single tool. It uses a simple and expressive syntax for writing tests and supports features like request chaining, dynamic payloads, and parallel test execution. Karate also provides capabilities for testing microservices built on other protocols like SOAP and GraphQL. Gatling Gatling is a popular open-source tool for load and performance testing. It allows you to simulate high user loads, measure response times, and analyze system behavior under stress conditions. Gatling provides a domain-specific language (DSL) for creating test scenarios and supports distributed load generation for scalability. It integrates well with CI/CD pipelines and offers detailed performance reports. Selenium Selenium is a widely used web application testing framework that can also be leveraged for testing microservices with web interfaces. It provides a range of tools and APIs for automating browser interactions and performing UI-based tests. Selenium supports various programming languages and offers capabilities for cross-browser testing, test parallelization, and integration with test frameworks like TestNG and JUnit. These are just a few examples of the many tools and frameworks available for microservices test automation. The choice of tool depends on factors such as project requirements, programming languages, team expertise, and integration capabilities with the existing toolchain. It’s essential to evaluate the features, community support, and documentation of each tool to select the most suitable one for your specific testing needs. Monitoring and Observability Monitoring and observability are essential for gaining insights into the health, performance, and behavior of microservices. Key monitoring aspects include: Log Aggregation and Analysis: Collecting and analyzing log data from microservices helps in identifying errors, diagnosing issues, and understanding the system’s behavior. Metrics and Tracing: Collecting and analyzing performance metrics and distributed traces provides visibility into the end-to-end flow of requests and highlights bottlenecks or performance degradation. Alerting and Incident Management: Establishing effective alerting mechanisms enables organizations to proactively respond to issues and incidents. Integrated incident management workflows ensure timely resolution and minimize disruptions. Distributed Tracing: Distributed tracing techniques allow for tracking and visualizing requests as they traverse multiple microservices, providing insights into latency, dependencies, and potential bottlenecks. Conclusion The performance, scalability, and reliability of complex distributed systems depend on the reliability of microservices. Organizations can lessen the difficulties brought about by microservices architecture by adopting a thorough testing strategy that includes unit testing, integration testing, contract testing, performance testing, security testing, chaos testing, and end-to-end testing. The overall quality and resilience of microservices-based applications are improved by incorporating best practices like test automation, containerization, CI/CD, service virtualization, scalability testing, and efficient monitoring, which results in better user experiences and successful deployments. The performance, dependability, and quality of distributed software systems are all dependent on the results of microservices testing. Organizations can find and fix problems at different levels, from specific microservices to end-to-end scenarios, by implementing a thorough testing strategy. Teams can successfully validate microservices throughout their lifecycle with the right test environment, infrastructure, and monitoring tools, facilitating quicker and more dependable software delivery. In today’s fast-paced technological environment, adopting best practices and using the appropriate testing tools and frameworks will enable organizations to create robust, scalable, and resilient microservices architectures, ultimately improving customer satisfaction and business success.

By Aditya Bhuyan

Best Practices for Microservices: Building Scalable and Efficient Systems

Microservices architecture has revolutionized modern software development, offering unparalleled agility, scalability, and maintainability. However, effectively implementing microservices necessitates a deep understanding of best practices to harness their full potential while avoiding common pitfalls. In this comprehensive guide, we will delve into the key best practices for microservices, providing detailed insights into each aspect. 1. Defining the "Micro" in Microservices Single Responsibility Principle (SRP) Best Practice: Microservices should adhere to the Single Responsibility Principle (SRP), having a well-defined scope of responsibility that encapsulates all tasks relevant to a specific business domain. Explanation: The Single Responsibility Principle, a fundamental concept in software design, applies to microservices. Each microservice should focus on a single responsibility, encapsulating all the tasks relevant to a specific business domain. This approach ensures that microservices are concise and maintainable, as they don't try to do too much, aligning with the SRP's principle of a class having only one reason to change. Simplifying Deployment Best Practice: Combine small teams with complete ownership, discrete responsibility, and infrastructure for continuous delivery to reduce the cost of deploying microservices. Explanation: The combination of small, self-sufficient teams, each responsible for a specific microservice, simplifies the deployment process. With complete ownership and infrastructure supporting continuous delivery, the cost and effort required to move microservices into production are significantly reduced. 2. Embracing Domain-Driven Design (DDD) Best Practice: Apply Domain-Driven Design (DDD) principles to design microservices with a strong focus on specific business domains rather than attempting to create universal solutions. Explanation: Domain-driven design (DDD) is a strategic approach to designing software systems, emphasizing the importance of aligning the software's structure with the organization's business domains. When implementing microservices, it's crucial to use DDD principles to ensure that each microservice accurately represents a specific business domain. This alignment helps in modeling and organizing microservices effectively, ensuring that they reflect the unique requirements and contexts of each area. 3. Encouraging Reusability Best Practice: Promote reuse of microservices within specific domains while allowing for adaptation for use in different contexts. Explanation: Reuse is a valuable principle in microservice design, but it should be restricted to specific domains within the organization. Teams can collaborate and agree on communication models for adapting microservices for use outside their original contexts. This approach fosters efficiency and consistency while avoiding unnecessary duplication of functionality. 4. Microservices in Comparison to Monolithic Systems Fostering Service Encapsulation Best Practice: Keep microservices small to ensure that a small group of developers can understand the entirety of a single microservice. Explanation: The size of microservices should be such that a small team or even a single developer can fully comprehend the entire service. This promotes agility, reduces complexity, and facilitates faster development and maintenance. Promoting Standardized Interfaces Best Practice: Expose microservices through standardized interfaces (e.g., RESTful APIs or AMQP exchanges) to enable reuse without tight coupling. Explanation: Microservices should communicate with each other through standardized interfaces that abstract the underlying implementation. This approach enables other services and applications to consume and reuse microservices without becoming tightly coupled to them, promoting flexibility and maintainability. Enabling Independent Scaling Best Practice: Ensure that microservices exist as independent deployment artifacts, allowing them to be scaled independently of other services. Explanation: Microservices should be designed to function as independent units that can be deployed and scaled separately. This flexibility allows organizations to allocate resources efficiently based on the specific demands of each microservice, improving performance and resource utilization. Automating Deployment Best Practice: Implement automation throughout the software development lifecycle, including deployment automation and continuous integration. Explanation: Automation is essential for microservices to achieve rapid development, testing, and deployment. Continuous integration and automated deployment pipelines allow organizations to streamline the release process, reducing manual intervention and ensuring consistent and reliable deployments. 5. Service Mesh and Management Practices Command Query Responsibility Segregation (CQRS) Best Practice: Consider separating microservices into command and query responsibilities, especially for high-traffic requirements. Explanation: In situations where specific business capabilities experience high traffic, it may be beneficial to separate the microservices responsible for handling queries (information retrieval) from those handling commands (state-changing functions). This pattern, known as Command Query Responsibility Segregation (CQRS), optimizes performance and scalability. Event Sourcing Best Practice: Embrace eventual consistency by storing changes to state as journaled business events. Explanation: To ensure consistency among microservices, especially when working asynchronously, consider adopting an event-sourcing approach. Instead of relying on distributed transactions, microservices can collaborate using domain events published to a message broker. This approach ensures eventual consistency once all microservices have completed their work. Continuous Delivery of Composed Applications Best Practice: Implement continuous delivery for composed microservice applications to ensure agility and real-time verification of business objectives. Explanation: Continuous delivery is essential for achieving agility and verifying that composed microservice applications meet their business objectives. Short release cycles, fast feedback on build failures, and automated deployment facilities are critical components of this approach. Reduce Complexity With Service Mesh Best Practice: Implement a service mesh architecture to simplify microservice management, ensuring secure, fast, and reliable service-to-service communications. Explanation: A service mesh is an architectural pattern that simplifies the management of microservices by providing secure and reliable communication between services. It abstracts governance considerations and enhances the security and performance of microservices interactions. 6. Fault Tolerance and Resilience Best Practice: Implement fault tolerance and resilience mechanisms to ensure that microservices can withstand and recover from failures gracefully. Explanation: Microservices should be designed to handle failures without causing widespread disruptions. This includes strategies such as circuit breakers, retry mechanisms, graceful degradation, and the ability to self-heal in response to failures. Prioritizing fault tolerance and resilience ensures that the system remains stable and responsive under adverse conditions. 7. Monitoring and Logging Best Practice: Establish comprehensive monitoring and logging practices to gain insights into the health and performance of microservices. Explanation: Monitoring and logging are essential for understanding how microservices are behaving in production. Implement robust monitoring tools and logging frameworks to track key performance metrics, detect anomalies, troubleshoot issues, and gain actionable insights. Proactive monitoring and logging enable timely responses to incidents and continuous improvement of microservices. By incorporating these two additional best practices—Fault Tolerance and Resilience, and Monitoring and Logging—organizations can further enhance the reliability and manageability of their microservices-based systems. 8. Decentralize Data Management Best Practice: In microservices architecture, each microservice should maintain its own copy of the data, avoiding multiple services accessing or sharing the same database. Explanation: Microservices benefit from data decentralization, where each microservice manages its own data independently. It is crucial not to set up multiple services to access or share the same database, as this would undermine the autonomy of microservices. Instead, design microservices to own and manage their data. To enable controlled access to a microservice's data, implement APIs that act as gateways for other services. This approach enforces centralized access control, allowing developers to incorporate features like audit logging and caching seamlessly. Strive for a data structure that includes one or two database tables per microservice, ensuring clean separation and encapsulation of data. 9. Promoting Loose Coupling Strategies Best Practice: Embrace strategies that promote loose coupling between microservices, both in terms of incoming and outgoing dependencies. Explanation: In a microservices architecture, maintaining loose coupling between services is crucial for flexibility and scalability. To achieve this, consider employing various strategies that encourage loose coupling: Point-to-point and Publish-Subscribe: Utilize messaging patterns such as point-to-point and publish-subscribe. These patterns help decouple senders and receivers, as they remain unaware of each other. In this setup, the contract of a reactive microservice, like a Kafka consumer, is defined by the name of the message queue and the structure of the message. This isolation minimizes dependencies between services. API-First Design: Adopt a contract-first design approach, where the API is designed independently of existing code. This practice prevents the creation of APIs tightly coupled to specific technologies and implementations. By defining the contract first, you ensure that it remains technology-agnostic and adaptable to changes, promoting loose coupling between services. By incorporating these strategies, you can enhance the loose coupling between microservices, making your architecture more resilient and adaptable to evolving requirements. Conclusion The core design principles outlined above serve as a solid foundation for crafting effective microservice architectures. While adhering to these principles is essential, the success of a microservice design goes beyond mere compliance. It requires a thorough understanding of quality attribute requirements and the ability to make informed design decisions while considering trade-offs. Additionally, familiarity with design patterns and architectural tactics that align with these principles is crucial. Equally important is a deep understanding of the available technology choices, as they play a pivotal role in the implementation and operation of microservices. Ultimately, a holistic approach that combines these design principles with careful consideration of requirements, design patterns, and technology options paves the way for successful microservice design and implementation.

By Lav Kumar

Designing Databases for Distributed Systems

This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report Database design is a critical factor in microservices and cloud-native solutions because a microservices-based architecture results in distributed data. Instead of data management happening in a single process, multiple processes can manipulate the data. The rise of cloud computing has made data even more distributed. To deal with this complexity, several data management patterns have emerged for microservices and cloud-native solutions. In this article, we will look at the most important patterns that can help us manage data in a distributed environment. The Challenges of Database Design for Microservices and the Cloud Before we dig into the specific data management patterns, it is important to understand the key challenges with database design for microservices and the cloud: In a microservices architecture, data is distributed across different nodes. Some of these nodes can be in different data centers in completely different geographic regions of the world. In this situation, it is tough to guarantee consistency of data across all the nodes. At any given point in time, there can be differences in the state of data between various nodes. This is also known as the problem of eventual consistency. Since the data is distributed, there's no central authority that manages data like in single-node monolithic systems. It's important for the various participating systems to use a mechanism (e.g., consensus algorithms) for data management. The attack surface for malicious actors is larger in a microservices architecture since there are multiple moving parts. This means we need to establish a more robust security posture while building microservices. The main promise of microservices and the cloud is scalability. While it becomes easier to scale the application processes, it is not so easy to scale the database nodes horizontally. Without proper scalability, databases can turn into performance bottlenecks. Diving Into Data Management Patterns Considering the associated challenges, several patterns are available to manage data in microservices and cloud-native applications. The main job of these patterns is to facilitate the developers in addressing the various challenges mentioned above. Let's look at each of these patterns one by one. Database per Service As the name suggests, this pattern proposes that each microservices manages its own data. This implies that no other microservices can directly access or manipulate the data managed by another microservice. Any exchange or manipulation of data can be done only by using a set of well-defined APIs. The figure below shows an example of a database-per-service pattern. Figure 1: Database-per-service pattern At face value, this pattern seems quite simple. It can be implemented relatively easily when we are starting with a brand-new application. However, when we are migrating an existing monolithic application to a microservices architecture, the demarcation between services is not so clear. Most of the functionality is written in a way where different parts of the system access data from other parts informally. Two main areas that we need to focus on when using a database-per-service pattern: Defining bounded contexts for each service Managing business transactions spanning multiple microservices Shared Database The next important pattern is the shared database pattern. Though this pattern supports microservices architecture, it adopts a much more lenient approach by using a shared database accessible to multiple microservices. For existing applications transitioning to a microservices architecture, this is a much safer pattern, as we can slowly evolve the application layer without changing the database design. However, this approach takes away some benefits of microservices: Developers across teams need to coordinate schema changes to tables. Runtime conflicts may arise when multiple services are trying to access the same database resources. CQRS and Event Sourcing In the command query responsibility segregation (CQRS) pattern, an application listens to domain events from other microservices and updates a separate database for supporting views and queries. We can then serve complex aggregation queries from this separate database while optimizing the performance and scaling it up as needed. Event sourcing takes it a bit further by storing the state of the entity or the aggregate as a sequence of events. Whenever we have an update or an insert on an object, a new event is created and stored in the event store. We can use CQRS and event sourcing together to solve a lot of challenges around event handling and maintaining separate query data. This way, you can scale the writes and reads separately based on their individual requirements. Figure 2: Event sourcing and CQRS in action together On the downside, this is an unfamiliar style of building applications for most developers, and there are more moving parts to manage. Saga Pattern The saga pattern is another solution for handling business transactions across multiple microservices. For example, placing an order on a food delivery app is a business transaction. In the saga pattern, we break this business transaction into a sequence of local transactions handled by different services. For every local transaction, the service that performs the transaction publishes an event. The event triggers a subsequent transaction in another service, and the chain continues until the entire business transaction is completed. If any particular transaction in the chain fails, the saga rolls back by executing a series of compensating transactions that undo the impact of all the previous transactions. There are two types of saga implementations: Orchestration-based saga Choreography-based saga Sharding Sharding helps in building cloud-native applications. It involves separating rows of one table into multiple different tables. This is also known as horizontal partitioning, but when the partitions reside on different nodes, they are known as shards. Sharding helps us improve the read and write scalability of the database. Also, it improves the performance of queries because a particular query must deal with fewer records as a result of sharding. Replication Replication is a very important data management pattern. It involves creating multiple copies of the database. Each copy is identical and runs on a different server or node. Changes made to one copy are propagated to the other copies. This is known as replication. There are several types of replication approaches, such as: Single-leader replication Multi-leader replication Leaderless replication Replication helps us achieve high availability and boosts reliability, and it lets us scale out read operations since read requests can be diverted to multiple servers. Figure 3 below shows sharding and replication working in combination. Figure 3: Using sharding and replication together Best Practices for Database Design in a Cloud-Native Environment While these patterns can go a long way in addressing data management issues in microservices and cloud-native architecture, we also need to follow some best practices to make life easier. Here are a few best practices: We must try to design a solution for resilience. This is because faults are inevitable in a microservices architecture, and the design should accommodate failures and recover from them without disrupting the business. We must implement proper migration strategies when transitioning to one of the patterns. Some of the common strategies that can be evaluated are schema first versus data first, blue-green deployments, or using the strangler pattern. Don't ignore backups and well-tested disaster recovery systems. These things are important even for single-node databases. However, in a distributed data management approach, disaster recovery becomes even more important. Constant monitoring and observability are equally important in microservices or cloud-native applications. For example, techniques like sharding can lead to unbalanced partitions and hotspots. Without proper monitoring solutions, any reactions to such situations may come too late and may put the business at risk. Conclusion We can conclude that good database design is absolutely vital in a microservices and cloud-native environment. Without proper design, an application will face multiple problems due to the inherent complexity of distributed data. Multiple data management patterns exist to help us deal with data in a more reliable and scalable manner. However, each pattern has its own challenges and set of advantages and disadvantages. No pattern fits all the possible scenarios, and we should select a particular pattern only after managing the various trade-offs. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report

By Saurabh Dashora CORE

Microservices With Apache Camel and Quarkus (Part 5)

In part three of this series, we have seen how to deploy our Quarkus/Camel-based microservices in Minikube, which is one of the most commonly used Kubernetes local implementations. While such a local Kubernetes implementation is very practical for testing purposes, its single-node feature doesn't satisfy real production environment requirements. Hence, in order to check our microservices behavior in a production-like environment, we need a multi-node Kubernetes implementation. And one of the most common is OpenShift. What Is OpenShift? OpenShift is an open-source, enterprise-grade platform for container application development, deployment, and management based on Kubernetes. Developed by Red Hat as a component layer on top of a Kubernetes cluster, it comes both as a commercial product and a free platform or both as on-premise and cloud infrastructure. The figure below depicts this architecture. As with any Kubernetes implementation, OpenShift has its complexities, and installing it as a standalone on-premise platform isn't a walk in the park. Using it as a managed platform on a dedicated cloud like AWS, Azure, or GCP is a more practical approach, at least in the beginning, but it requires a certain enterprise organization. For example, ROSA (Red Hat OpenShift Service on AWS) is a commercial solution that facilitates the rapid creation and the simple management of a full Kubernetes infrastructure, but it isn't really a developer-friendly environment allowing it to quickly develop, deploy and test cloud-native services. For this later use case, Red Hat offers the OpenShift Developer's Sandbox, a development environment that gives immediate access to OpenShift without any heavy installation or subscription process and where developers can start practicing their skills and learning cycle, even before having to work on real projects. This totally free service, which doesn't require any credit card but only a Red Hat account, provides a private OpenShift environment in a shared, multi-tenant Kubernetes cluster that is pre-configured with a set of developer tools, like Java, Node.js, Python, Go, C#, including a catalog of Helm charts, the s2i build tool, and OpenShift Dev Spaces. In this post, we'll be using OpenShift Developer's Sandbox to deploy our Quarkus/Camel microservices. Deploying on OpenShift In order to deploy on OpenShift, Quarkus applications need to include the OpenShift extension. This might be done using the Qurakus CLI, of course, but given that our project is a multi-module maven one, a more practical way of doing it is to directly include the following dependency in the master POM: XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-openshift</artifactId> </dependency> <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-container-image-openshift</artifactId> </dependency> This way, all the sub-modules will inherit the dependencies. OpenShift is supposed to work with vanilla Kubernetes resources; hence, our previous recipe, where we deployed our microservices on Minikube, should also apply here. After all, both Minikube and OpenShift are implementations of the same de facto standard: Kubernetes. If we look back at part three of this series, our Jib-based build and deploy process was generating vanilla Kubernetes manifest files (kubernetes.yaml), as well as Minikube ones (minikube.yaml). Then, we had the choice between using the vanilla-generated Kubernetes resources or the more specific Minikube ones, and we preferred the latter alternative. While the Minikube-specific manifest files could only work when deployed on Minikube, the vanilla Kubernetes ones are supposed to work the same way on Minikube as well as on any other Kubernetes implementation, like OpenShift. However, in practice, things are a bit more complicated, and, as far as I'm concerned, I failed to successfully deploy on OpenShift vanilla Kubernetes manifests generated by Jib. What I needed to do was to rename most of the properties whose names satisfy the pattern quarkus.kubernetes.* by quarkus.openshift.*. Also, some vanilla Kubernetes properties, for example quarkus.kubernetes.ingress.expose, have a completely different name for OpenShift. In this case quarkus.openshift.route.expose. But with the exception of these almost cosmetic alterations, everything remains on the same site as in our previous recipe of part three. Now, in order to deploy our microservices on OpenShift Developer's Sandbox, proceed as follows. Log in to OpenShift Developer's Sandbox Here are the required steps to log in to OpenShift Developer Sandbox: Fire your preferred browser and go to the OpenShift Developer's Sandbox site Click on the Login link in the upper right corner (you need to already have registered with the OpenShift Developer Sandbox) Click on the red button labeled Start your sandbox for free in the center of the screen In the upper right corner, unfold your user name and click on the Copy login command button In the new dialog labeled Log in with ... click on the DevSandbox link A new page is displayed with a link labeled Display Token. Click on this link. Copy and execute the displayed oc command, for example: Shell $ oc login --token=... --server=https://api.sandbox-m3.1530.p1.openshiftapps.com:6443 Clone the Project From GitHub Here are the steps required to clone the project's GitHub repository: Shell $ git clone https://github.com/nicolasduminil/aws-camelk.git $ cd aws-camelk $ git checkout openshift Create the OpenShift Secret In order to connect to AWS resources, like S3 buckets and SQS queues, we need to provide AWS credentials. These credentials are the Access Key ID and the Secret Access Key. There are several ways to provide these credentials, but here, we chose to use Kubernetes secrets. Here are the required steps: First, encode your Access Key ID and Secret Access Key in Base64 as follows: Shell $ echo -n <your AWS access key ID> | base64 $ echo -n <your AWS secret access key> | base64 Edit the file aws-secret.yaml and amend the following lines such that to replace ... by the Base64 encoded values: Shell AWS_ACCESS_KEY_ID: ... AWS_SECRET_ACCESS_KEY: ... Create the OpenShift secret containing the AWS access key ID and secret access key: Shell $ kubectl apply -f aws-secret.yaml Start the Microservices In order to start the microservices, run the following script: Shell $ ./start-ms.sh This script is the same as the one in our previous recipe in part three: Shell #!/bin/sh ./delete-all-buckets.sh ./create-queue.sh sleep 10 mvn -DskipTests -Dquarkus.kubernetes.deploy=true clean install sleep 3 ./copy-xml-file.sh The copy-xml-file.sh script that is used here in order to trigger the Camel file poller has been amended slightly: Shell #!/bin/sh aws_camel_file_pod=$(oc get pods | grep aws-camel-file | grep -wv -e build -e deploy | awk '{print $1}') cat aws-camelk-model/src/main/resources/xml/money-transfers.xml | oc exec -i $aws_camel_file_pod -- sh -c "cat > /tmp/input/money-transfers.xml" Here, we replaced the kubectl commands with the oc ones. Also, given that OpenShift has this particularity of creating pods not only for the microservices but also for the build and the deploy commands, we need to filter out in the list of the running pods the ones having string occurrences of build and deploy. Running this script might take some time. Once finished, make sure that all the required OpenShift controllers are running: Shell $ oc get is NAME IMAGE REPOSITORY TAGS UPDATED aws-camel-file default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-file 1.0.0-SNAPSHOT 17 minutes ago aws-camel-jaxrs default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-jaxrs 1.0.0-SNAPSHOT 9 minutes ago aws-camel-s3 default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-s3 1.0.0-SNAPSHOT 16 minutes ago aws-camel-sqs default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-sqs 1.0.0-SNAPSHOT 13 minutes ago openjdk-11 default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/openjdk-11 1.10,1.10-1,1.10-1-source,1.10-1.1634738701 + 46 more... 18 minutes ago $ oc get pods NAME READY STATUS RESTARTS AGE aws-camel-file-1-build 0/1 Completed 0 19m aws-camel-file-1-d72w5 1/1 Running 0 18m aws-camel-file-1-deploy 0/1 Completed 0 18m aws-camel-jaxrs-1-build 0/1 Completed 0 14m aws-camel-jaxrs-1-deploy 0/1 Completed 0 10m aws-camel-jaxrs-1-pkf6n 1/1 Running 0 10m aws-camel-s3-1-76sqz 1/1 Running 0 17m aws-camel-s3-1-build 0/1 Completed 0 18m aws-camel-s3-1-deploy 0/1 Completed 0 17m aws-camel-sqs-1-build 0/1 Completed 0 17m aws-camel-sqs-1-deploy 0/1 Completed 0 14m aws-camel-sqs-1-jlgkp 1/1 Running 0 14m oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE aws-camel-jaxrs ClusterIP 172.30.192.74 <none> 80/TCP 11m modelmesh-serving ClusterIP None <none> 8033/TCP,8008/TCP,8443/TCP,2112/TCP 18h As shown in the listing above, all the required image streams have been created, and all the pods are either completed or running. The completed pods are the ones associated with the build and deploy operations. The running ones are associated with the microservices. There is only one service running: aws-camel-jaxrs. This service makes it possible to communicate with the pod that runs the aws-camel-jaxrs microservice by exposing the route to it. This is automatically done in effect to the quarkus.openshift.route.expose=true property. And the microservice aws-camel-sqs needs, as a matter of fact, to communicate with aws-camel-sqs and, consequently, it needs to know the route to it. To get this route, you may proceed as follows: Shell $ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD aws-camel-jaxrs aws-camel-jaxrs-nicolasduminil-dev.apps.sandbox-m3.1530.p1.openshiftapps.com aws-camel-jaxrs http None Now open the application.properties file associated with the aws-camel-sqs microservice and modify the property rest-uri such that to read as follows: Properties files rest-uri=aws-camel-jaxrs-nicolasduminil-dev.apps.sandbox-m3.1530.p1.openshiftapps.com/xfer Here, you have to replace the namespace nicolasduminil-dev with the value which makes sense in your case. Now, you need to stop the microservices and start them again: Shell $ ./kill-ms.sh ... $ ./start-ms.sh ... Your microservices should run as expected now, and you may check the log files by using commands like: Shell $ oc logs aws-camel-jaxrs-1-pkf6n As you may see, in order to get the route to the aws-camel-jaxrs service, we need to start, to stop, and to start our microservices again. This solution is far from being elegant, but I didn't find any other, and I'm relying on the advised reader to help me improve it. It's probably possible to use the OpenShift Java client in order to perform, in Java code, the same thing as the oc get routes command is doing, but I didn't find how, and the documentation isn't too explicit. I would like to present my apologies for not being able to provide here the complete solution, but enjoy it nevertheless!

By Nicolas Duminil CORE

Traffic Management and Network Resiliency With Istio Service Mesh

Istio's virtual services and destination rules help DevOps engineers and cloud architects apply granular routing rules and direct traffic around the mesh. Besides, they provide features to ensure and test network resiliency so that applications operate reliably. In this article, we will explore both the features of Istio: traffic routing and network resilience testing. Traffic Routing in CI/CD With Istio Istio can split traffic between service or service subsets with ease. Traffic splitting is done based on weights/percentages (refer to the image below) defined in the corresponding virtual service and destination rule resources. Traffic splitting between different versions of a service using Istio Istio’s traffic-splitting capability helps DevOps and cloud architects have granular control over how traffic is routed to different versions of a service. The feature is useful, especially in performing canary or blue/green deployments. Canary Deployments With Istio Canary deployments are a software release strategy where only a fraction of live traffic is routed to the newly released software or service. If the performance and quality of the new version is stable like the existing version, more traffic is routed to the new version, and the older one is phased out gradually. Canary deployments allow controlled release and help organizations minimize the impact of potential bugs or issues during releases. Istio provides two ways to carry out canary rollouts: Istio routes traffic to Canary and the stable ones that are deployed as two different services. Istio routes traffic to the canary and the stable one, which are both subsets of a single application. We have covered a tutorial on implementing canary using Istio and Argo CD Rollouts. Check it out here: How to implement Canary for Kubernetes apps using Istio. Blue/Green Deployments Using Istio Blue/green deployment is another progressive delivery strategy where a new version (green) of a service runs in parallel to the existing version (blue). Here, the load balancer switches production traffic to the new version (refer to the image below) and is rolled back to the older version in case of any issues with the new release. Blue/green deployment with Istio Blue/green deployments help minimize application downtime by providing the ability to instantly rollback to the older version. The older version acts as a reliable backup during the deployment process. Istio’s ability to seamlessly split traffic between service subsets without changing the application code aids in carrying out blue/green deployments effortlessly. The following sample VirtualService rule implements a blue/green strategy for istio-support service. All incoming traffic is routed to the green version (weight: 100), while the blue version (weight: 0) remains a backup, handling zero requests. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support spec: hosts: - istio-support http: - route: - destination: host: istio-support subset: green weight: 100 - destination: host: istio-support subset: blue weight: 0 Network Resilience and Testing With Istio Apart from implementing routing rules for different scenarios, Istio provides opt-in failure recovery and fault injection features, which aid enterprises in maintaining a resilient infrastructure. The features prevent localized failures cascading to other nodes and help DevOps and architects meet SLOs like error time, latency, and uptime. Some of the network resilience and testing features provided by Istio are circuit breakers, retries, timeouts, and traffic mirroring. Circuit Breaking In a web application, circuit breakers set limits to concurrent connections to a service and prevent it from overloading with requests. It is highly useful for B2B and B2C SaaS applications where service responsiveness can make or break the customer experience. In circuit breaking, if the maximum connection requests to an upstream service go over the specified limit, they will become pending in a queue. If the number of pending requests breaches the limit, further requests are denied until the pending ones are processed. A circuit breaker is tripped so that client requests fail quickly without exhausting the services and cascading the failure to the overall system. Istio uses DestinationRule to configure circuit breakers. Here is a sample DestinationRule with circuit breaker rules: YAML apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: istio-support spec: host: istio-support trafficPolicy: connectionPool: tcp: maxConnections: 1 http: http1MaxPendingRequests: 1 maxRequestsPerConnection: 1 outlierDetection: consecutive5xxErrors: 1 interval: 1s baseEjectionTime: 3m maxEjectionPercent: 100 The above DestinationRule sets the number of maximum connections and pending requests to istio-support service to 1. If istio-support receives 3 requests simultaneously; for example, one request will establish a connection, one will be in the queue, and the third one or any additional requests will be denied until the pending one is processed (refer to the image below). Circuit breaking with Istio The outlierDetection section towards the end defines the rules to evict unhealthy pods out of the load-balancing pool. It means that if any pod of istio-support triggers a 5xx (or server) error, the pod will be considered an outlier or unhealthy. It will then be ejected out for 3 minutes before being allowed to rejoin the load balancing pool. Timeouts Timeout refers to the amount of time the Envoy proxy of the source should wait for a response from the destination service. Timeouts fail or succeed a call within a specific timeframe, which ensures that services do not wait indefinitely for a response. Timeout with Istio You can implement timeouts in your environment using Istio. Istio allows you to create timeout policies and apply them at the source Envoy sidecars. Below is a sample VirtualService that configures a 10-second timeout for requests to istio-support service. In other words, calls to istio-support service will either fail or succeed in 10 seconds. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support spec: hosts: - istio-support http: - route: - destination: host: istio-support timeout: 10s Timeouts should not be short or too long. Short timeouts will result in unnecessarily failed requests, especially when upstream services face transient issues, such as a temporarily overloaded network. Timeouts that are too long cause increased latency, especially if the call waits for a response from a failed service. If timeouts have to be configured on traffic to a destination outside the mesh, the destination service should first be added to Istio’s internal service registry using ServiceEntry resource. Virtual service rules can then be applied to those traffic. With Istio, you can easily configure timeouts on traffic to any specific service/subset in the runtime. Retries The retry setting specifies the number of times an Envoy proxy should attempt to connect to a service if the initial request fails. It helps to improve service availability when services face temporary issues, like resource contention, network problems, etc. Retries with Istio The following VirtualService sets the maximum number of retries to 4 while calling istio-support service. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support spec: hosts: - istio-support http: - route: - destination: host: istio-support retries: attempts: 4 perTryTimeout: 2s In the above resource, attempts represents the maximum number of retries allowed for a given request. Exceeding attempts will activate a circuit breaker. perTryTimeout defines the timeout per attempt, including the initial call and any retries. retryOn subfield can be added under retries field, which will help to set conditions under which retry takes place. The conditions should be valid HTTP status, and there can be one or more conditions or policies.For example, retryOn: connect-failure,refused-stream,503 means that Istio will initiate a retry if the upstream service returns any of these HTTP status codes (connect-failure, refused-stream, 503). Fault Injection Fault injection is a testing method that includes introducing errors while forwarding HTTP requests to the destination specified in a route. Istio lets DevOps and cloud architects test the resiliency and failure recovery capacity of applications by injecting faults.With Istio, faults can be injected at the application layer. That is, more relevant failures can be injected, such as HTTP error codes, instead of killing pods, delaying packets, or corrupting packets at the TCP layer. Istio lets users inject two types of faults using VirtualService resource: Delays: Used to delay requests to upstream services and simulate network latency or an overloaded upstream service. Fault injection by delaying requests using Istio Aborts: Used to abort HTTP request attempts and return error codes to downstream service, in order to simulate a faulty upstream service. Fault injection by aborting request forwarding using Istio The following VirtualService will inject a 5 second delay on all the requests going to istio-support service. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support-delay spec: hosts: - istio-support http: - fault: delay: percentage:100 fixedDelay: 5s Similarly, the below VirtualService resource configures istio-support service to return an HTTP 500 error for each received request. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support-500 spec: hosts: - istio-support http: - route: - destination: host: istio-support fault: abort: percent: 100 httpStatus: 500 Note: A fault rule must have delay or abort or both. Also, simultaneously specifying delay and abort faults does not create any dependencies between them. Traffic Mirroring Traffic mirroring refers to sending a copy of the live traffic to a mirrored service. It is useful in testing, monitoring, and analyzing a newly deployed application before releasing and routing production traffic to it. Traffic mirroring with Istio The mirrored traffic does not affect the performance of the primary service, as it is separate from the main flow of requests served by the primary service. Also, responses from mirrored services are discarded. The following route rule sends 100% of the traffic to v1 and then to istio-support:v2. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: istio-support-mirror spec: hosts: - istio-support http: - route: - destination: host: istio-support subset: v1 weight: 100 mirror: host: istio-support subset: v2 mirrorPercentage: value: 100.0 --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: istio-support spec: host: istio-support subsets: - name: v1 labels: version: v1 - name: v2 labels: version: v2 The value field under mirrorPercentage allows users to further control the traffic by sending a fraction of the requests instead of mirroring all requests. Video: Advanced Traffic Management Using Istio Now it’s time for some action. Watch the following video to see the demo on advanced traffic management with Istio. You will see a tutorial on traffic management, retries, circuit breaking, and fault injection with the application deployed in the Kubernetes cluster along with the Istio ingress gateway.

By Anas T

REST vs. Message Brokers: Choosing the Right Communication

With the clear dominance of microservices architecture, communication between different components of a system is a critical aspect of today’s software paradigm. Two popular methods of achieving this communication are through REST (direct communication), and message brokers (indirect communication). Each approach has its own set of advantages and trade-offs, making it essential for developers to understand the differences between them in order to make informed decisions when designing and building their systems. Although the two feel like serving completely different use cases and do not intertwine, in many cases and architectures, they are. In this article, we’ll delve into the disparities between REST and message brokers in terms of way of communication, helping you make the right choice for your specific use case. REST REST, a widely used architectural style for designing networked applications, relies on stateless communication between client and server. Here are some key features of REST communication: Request-Response Paradigm: REST operates on a simple request-response model. Clients initiate a request to the server, and the server responds with the requested data or an appropriate status code. HTTP Verbs: REST communication is based on HTTP verbs such as GET, POST, PUT, and DELETE, which correspond to CRUD (Create, Read, Update, Delete) operations. Resource-Oriented: REST revolves around the concept of resources identified by URLs. Each resource represents a distinct entity, and interactions are performed using these URLs. Stateless: REST is designed to be stateless, meaning that each request from the client must contain all the information required for the server to fulfill it. This simplifies server-side management and scalability. Caching: REST communication benefits from HTTP’s built-in caching mechanisms, which can improve performance and reduce server load by serving cached responses when appropriate. Message Brokers Message brokers facilitate asynchronous communication between components by allowing them to exchange messages or pieces of information, meaning the sender does not aware at any given time if a receiver exists or who it is. Here’s what you need to know about this approach: Decoupled Architecture: Message brokers promote decoupling between sender and receiver, allowing components to communicate without having to be aware of each other’s existence. Publish-Subscribe Model: In the publish-subscribe model, producers (publishers) send messages to specific topics, and consumers (subscribers) interested in those topics receive the messages. This enables broadcasting information to multiple consumers. Message Queues: Message brokers also support point-to-point communication through message queues. Producers send messages to queues, and consumers retrieve messages from those queues, ensuring that each message is processed by a single consumer. Reliability: Message brokers ensure message delivery, even in cases of component failures. This reliability is achieved through features like message persistence and acknowledgment mechanisms. Scalability: Message brokers can be scaled horizontally to handle increasing message volumes and provide load balancing across consumers. The Story of Microservices Representational state transfer (REST) uses a popular architectural pattern called API Gateway, and it can serve as a good example of the synchronous communication type.Requests reach a service that acts as an internal router that routes requests based on the different values, headers, and query params. Message brokers/queues are widely used in a microservices architecture as well, which follows the asynchronous pattern. In this type of architecture, a service sends a message without waiting for a response, and one or more services process the message asynchronously. Asynchronous messaging provides many benefits but also brings challenges such as idempotency, message ordering, poison message handling, and complexity of message broker, which must be highly available. It is important to note the difference between asynchronous I/O and the asynchronous protocol. Asynchronous I/O means that the calling thread is not blocked while the I/O operations are executed. This is an implementation detail in terms of the software design. The asynchronous protocol means the sender does not wait for a response. Choosing the Right Approach The decision between REST communication and message brokers depends on various factors, including the nature of your application, communication patterns, and requirements: REST is suitable when: Direct request-response interactions are preferred. Your app requires simplicity in communication patterns. You have very strict communication rules with almost 1:1 sender/receiver ratio Scale is small and so is the amount of communicating services, workloads, and amount of transferred data. Message brokers are beneficial and a must when: Asynchronous communication is needed and allowed. Many-to-Many communication pattern is needed. Components are loosely coupled, allowing for independent scaling. Reliability and guaranteed message delivery are paramount. Publish-subscribe or message queue patterns align with the application’s communication needs. A great scale is needed to support billions of requests in a short period of time. Scaling the microservices would be overkill. In conclusion, both REST and message brokers offer distinct advantages for different scenarios. REST provides simplicity and direct interactions, while message brokers enable decoupled, asynchronous, reliable, and much more scaleable communication. The choice between these approaches should be made based on your system’s requirements, the specific communication patterns your application demands, and the maturity of both the environment and the developers themselves.

By Shoham Roditi Elimelech

Modular Software Architecture: Advantages and Disadvantages of Using Monolith, Microservices and Modular Monolith

In recent years, microservice architecture has taken the lead in most software solutions, and in many cases, it is most often chosen as the architecture from which we start development. However, it’s worth asking yourself whether this is always the optimal choice. Moreover, if you choose microservices as a set of rules you want to stick to, are you sure you are aware of the consequences of this choice? The Advantages of Microservices In my opinion, microservices offer two main benefits: Independent deployments without downtime. A logical (sometimes technical) division of the system (also the database) into business modules and sub-modules. The “Distributed Monolith” Problem Unfortunately, in most cases, when microservice architecture is chosen, the team ends up creating a so-called “distributed monolith.” If, at the beginning of work, you rely on dependencies between services or DB, and in the end, you deploy 90% of services simultaneously, you should admit that it would be easier to do it as a single deployment unit. It would reduce the effort related to the implementation, automation, and maintenance of microservices and allow you to focus intensively on business problems. Long story short, you have to remember that the microservice software architecture style is not simple and carries a lot of technological complexity. Don’t crack a nut with a sledgehammer! Running a Kubernetes cluster for one application or service doesn’t make sense because the costs of infrastructure and all configurations will exceed the costs of development. There are other “simpler” cloud solutions, e.g. AWS ECS, AWS Fargate, AWS Beanstalk, or even EC2 + simple Load Balancing (other cloud providers have similar solutions). Below are some heuristics that can help you decide which architecture to choose. What do you need? Microservices Monolith Independent implementation units Yes, but only with good logical separation Yes Simple and quick to build infrastructure No Yes Dynamic scaling Yes No Business logic autonomy Yes – if we divide the domains correctly Yes – if we divide the domains correctly Dynamic horizontal scaling of specific system components Yes No Technological autonomy Yes No Independent development teams Yes No Quick project start (development kick-off) No Yes Monolith Doesn’t Have To Be a Bad Thing – Especially With Modular Software Architecture The term “monolith” is often used as a synonym for legacy applications. By properly designing a monolithic application (appropriate selection of internal architecture), you can definitely shorten the development kickoff without excluding possible changes in the future. And you can still transition to microservices. This is where the modular monolith approach – or, in particular, the “monolith first” approach – comes in handy. If you don’t know what scope the project will have in the coming years, and you don’t know how fast it will grow, then starting with a well-structured monolithic application may be a good idea. What does the Modular Monolith offer? A single deployment unit Easier maintenance An open road to possible subsequent migration to a distributed architecture. Simple infrastructure It tidies up the code Of course, there are many other factors that may affect the choice of system architecture. However, taking into account the ones I mentioned above, it might be better not to start the project with an assumption it’ll be based on microservices. If you’re unsure how the whole system will look in 2–3 years, it’s usually better to go for the modular architecture. You start with a well-planned monolith and then – if needed – gradually transform it into a modular monolith. You should also remember about the available, simple solutions that don’t require system architecture at all – such as serverless solutions, AWS Lambda, etc. They can work well in the case of relatively simple, not overly complex problems. At Pretius, we focus on long-term projects and a long life cycle (maintenance and development), which is why sustainability and expandability are usually very important to us (but, of course, your particular case might differ). The choice of system architecture often boils down to the method of deployment and, consequently, the infrastructure where the system is launched. However, from the perspective of software developers, infrastructure and application deployment are slowly becoming secondary topics – there are specialists with dedicated positions (e.g., DevOps engineers) who make important decisions and take care of these problems. We simply want the system to be well-maintained, work without problems, and don’t incur technological/business debt. For that to happen, we need to focus on the application code. Internal Application Architecture Once you decide on the system architecture, it’s time to focus on the architecture of individual applications. Unfortunately, most projects are based on layers (n-tiered architecture). In the case of complex projects, the end result is usually the same: → Big ball of Mud (example on the screenshot below). It’s not a bad choice for simple CRUD apps or a few/dozen services, but when you try to put complex logic into such a scheme, it quickly turns out that the network of dependencies starts to cause serious problems. It’s a good idea to be aware of the existence of different styles and get to know them in more detail. Application architecture styles: Layered architecture Hexagonal/onion architecture Pipes and Filters Microkernel It is worth adding that you can mix and match these styles. You don’t need to fixate on a specific solution but simply choose tools that’ll make your life easier. This also applies to system architecture. For example, if it turns out you need to create a larger (e.g., monolithic) application in your microservice system, then you shouldn’t try to force it to fit into the microservice architecture. Instead, choose the internal architecture, divide it into separate modules/business domains, and therefore reduce the costs of DevOps/configuration work. The Golden Middle Ground You can also try to find a middle ground while choosing architecture. To be perfectly honest, no such thing as the “golden middle” exists – but you can get close. Before choosing a specific architectural style, it is worth collecting some metrics about the designed system. The more information you have available, the more reliable your decision will be. System complexity (Shallow vs Deep) Shallow – all CRUD-type systems without or with a negligible amount of business logic. Deep – high complexity (business/technology). Complexity can be determined by such characteristics as: Communicability Business rules Algorithms Coordination Time perspective What’s going to change Likely/Impossible changes Separation of responsibility – is there a need for independent teams to work on the solution? A shallow system is easy to recognize because its user interface fully reflects the structures in the database and in the application code (→ CRUD). There are no complex integrations or complicated algorithms. The main characteristic of deep systems, on the other hand, is that a number of operations invisible to the user – more or less complex – take place from the moment of user interaction to the final effect. A Google search engine or a system for processing leasing applications are good examples of deep systems. It’s hard to talk about the principles of selecting architecture – “heuristics” that can direct us to a specific choice seem to be better term. For shallow, uncomplicated systems developed by a small team, you can choose simpler architectures, e.g., layered and monolithic applications. However, when you know from the beginning that the complexity will be high, it will be a project for years and will require the work of large or many teams, the appropriate division of applications will be crucial, and we will strive for separation of responsibilities. This can be achieved both in the microservice architecture and in the previously mentioned modular monolith. There are also technical aspects that must be taken into account when choosing a specific style. Good old brainstorming – or its newer form, event storming – are some of the tools you can use to create an initial outline of modules/services. The purpose of such exercises is to form an initial outline of domains – or lack of them – in a given business. Then, you should be able to estimate the size of the system. Below is an example of the first outline of domains/modules after analyzing the object diagram (marked with colors). Database The database model is often neglected when teams choose the architecture for their project. They divide the services into smaller or larger ones, they run them in the cloud, and they build the entire envelope related to the production of microservices, but they still design the database as if it were part of a large, tangled monolithic application. It’s not a good approach – if you decide on microservices, the database needs to reflect the logical division that you used on the application layer. Service binding at the database level is one of the major problems in modern microservice architectures, leading to the Big Ball of Mud (shown in one of the images above) and subsequent maintenance difficulties. Schemas If you cannot afford separate databases per service, the division into schemas is enough to start with. Schemes will also work in modular monoliths. When it comes to the data layer with microservices, you should take advantage of the opportunities they offer – namely, you should match the database type to the business model and not the other way around. With a little effort, everything can be flattened to a relational model, while the freedom of microservices architecture allows us not to do so. A common way of designing a database is to create structures based on the data we need to collect (e.g., based on information received from the client). It is a good exercise to try to design a data model based on the functionalities that the application must fulfill. The resulting model is usually much “thinner” than the original assumptions, and it turns out that it is fully sufficient. ORM vs. SQL We’ve used the MyBatis tool to communicate with the DB during many Pretius projects. This allows you to maintain 100% control over what’s happening in the database, but it also has an often unnoticeable side effect – Anemic Domain Model. Writing complex SQL statements often discourages developers from creating complex relationships within the model on the side of the application code. Another consequence of that is a departure from the object-oriented programming paradigm (lack of encapsulation logic in services). Pretty much any developer can learn and handle solutions such as MyBatis (and other alternatives, like Hibernate, for example), so they’re very much worth considering. However, they’re just tools, and they won’t release you from the obligation to think about the side effects and consequences of your decisions. Communication When it comes to communication in the microservices architecture, the Asynchronous By Default approach is preferred, but you certainly can’t fully avoid using either REST or some other synchronous form of communication (gRPC, SOAP, etc.). Technology is one thing, but there are more important questions you should ask yourself. Why do you communicate? Are these REST calls necessary? Has the logic been properly separated into another module/service? Often, microservices become a network of mutual connections – to put it figuratively, everything talks to everything. This is usually a consequence of bad division/decomposition into domains and is already a serious indication that you have built a distributed monolith and not microservices. This problem is not just limited to communication at the level of applications/services. You have to remember that the same rules apply at the code level – Services, Facades, Classes, or packages. In both cases, it is worth familiarizing yourself with the concept of Low coupling, High cohesion. Communication in the technical context is usually not a problem, but distributed transactions, sagas, compensations, and fallbacks (which are sometimes a consequence of distributed architecture) are. This is an area to focus on. A few patterns for communication in distributed systems which you should know: Outbox pattern Saga pattern Messaging CQRS Creating the Basic Project Structure The structure of the project should be closely related to the selected system and application architecture. Apart from the components that should be common, for example, as part of DevOps requirements (applies to system architecture), the internal structure of the application itself will be determined by the selected application architecture. Apart from architectures, in order to maintain better readability of the code in a business context, you can use the “package by feature” approach. The idea is to place code grouped within packages for a specific domain/functionality/area and not – as is usually the case – divided into technical packages. i.e., controllers, services, mappers, etc. Also, to be perfectly honest, I don’t recommend creating application skeletons that can be re-used within the company. There are free tools, such as Spring Initializr, which you can use to create such a project outline in a few seconds, so it’s better to consider each case independently. Copying something from other projects can lead to technical debt from the beginning – since you don’t even check if a newer/better option is available. However, once you choose a structure, it’s a good idea to stick to it throughout the project in order to maintain consistency and transparency. Testing Strategies The testing pyramid (screenshot above) is not fully applicable in the case of distributed architectures. In addition to testing within one implementation unit, you must ensure testing at the interface between these units because you have to assume that such communication will be present. So, in addition to standard integration tests between modules/services, you can also use Contract Tests as part of API testing between modules (additional information on that is available here). Also, the testing pyramid will look different depending on the module. In the case of shallow modules (let’s say CRUD), it doesn’t make sense to do all kinds of tests because, in the end, you’ll simply be checking the same thing several times – the main difference will be how you invoke the tests. For deep, complex modules where the business logic can be complicated and extensive, unit tests are the way. Integrating modules, on the other hand, will necessarily require more integration tests than unit tests. Centralized Log Storage The distributed (microservice) architecture will require an appropriate method of collecting application logs. Services can exist in a production environment in many instances and run on completely different physical machines. As a result, logging into the server and manually searching logs in text files becomes time-consuming, and sometimes it may even be impossible. The solution to this problem is to implement a centralized log store, such as a document database in which logs from all applications included in the system will be collected using auxiliary tools. The market standard for implementing the requirement described above is currently a set of three tools: 1. Logstash – an application responsible for collecting logs from the application and their processing and aggregation and then sending them to the database. 2. Opensearch/Elasticsearch (document database) – a data warehouse. 3. Dashboard/Kibana – an application responsible for the visualization of collected logs. The specific database and log visualization tool will depend on the project. The two most likely options are mentioned in the points above. The diagram below shows a simplified flow of how logs from websites/applications go to the database engine. You can integrate with Logstash via the REST API, which allows you to use it regardless of the programming language used in the application. If it is not possible to integrate the application directly with Logstash, it is possible to use the Beats tool responsible for downloading logs from text files, parsing them, and then sending them to the Logstash tool. Monitoring One of the consequences of using a microservice architecture is quite high technological complexity. Applications are distributed on many physical machines; the system may include dozens of applications and various supporting components. Therefore, a very important aspect in the maintenance of distributed systems is their continuous monitoring, i.e., checking the condition of the system. Thanks to this, when we notice a malfunction or system error, we are able to shorten the reaction time as much as possible. Paradoxically, in order to control the growing complexity of architecture, we need to implement even more tools that will help us in this. What Should You Monitor? Business applications: The applications/services provided by the software development teams will constitute the core of the application and will meet the main business requirements. Therefore, it’s a key element that you need to monitor. There are several areas worth paying attention to Technical metrics (may apply to a specific application/service): They are provided in applications by ready-made libraries, specific to each programming language and depending on the technologies used. The application’s technical parameters are similar to those in the infrastructure section (CPU usage, RAM, free disk space), but they only apply to a specific application or its instance. Business Metrics: This is the data that the development teams collect. It may sometimes seem redundant, but it can serve as an alternative to building statistics by other teams. Application logs: You will be able to extract both technical and business information from application logs. System infrastructure: In addition to business applications, an important aspect is the condition of the infrastructure and physical machines on which the platform runs. Such monitoring allows you, for example, to check how much free space you have left on the disk and inform you about it in advance. Examples of infrastructure monitoring parameters include: Free space on hard drives Free/used memory CPU usage Web traffic Alerts and Notifications: They are an inseparable element of monitoring that eliminates the need to observe the collected metrics by a person. So, for example, when, based on the collected metrics, the system finds that a given application has been using 100% of the available RAM for 10 minutes, an email will automatically be sent to the maintenance team, which will be able to take appropriate action. Tools The same tools are used for all types of metrics and alerts and can be integrated from any programming language. InfluxDB: An open-source database for storing time series data (Time Series Database). E.g., CPU utilization values read from sensors every 1s. Grafana: An open-source, cross-platform web application for analytics and interactive visualization. After connecting to supported data sources (SQL, NoSQL, Logi, InfluxDB), we can create various types of charts and set appropriate notifications. Domain Approach to Business Logic Applications are often built in a layered architecture. By this, I mean a technical division into packages of the following types: controllers, services, a model, etc. In the code, the model is usually a simple mapping of a DB column to fields in POJO (no relation, only reference by ID). Business logic, on the other hand, is fully implemented in “services” packages. Such a division is not wrong, but it can work a bit worse in large projects. N-tiered architecture problems: No logical division of responsibility (you create code based on technical aspects instead of business ones). It’s easier to create an “everything talks to everything” situation. Anemic Domain Model There’s often no encapsulation whatsoever. Difficulties in separating the abstraction layer. POJOs are typically shared across the application. It’s easier to test individual components, but due to the fact that the logic is scattered across different areas, it’s difficult to test specific, comprehensive business cases. Classes with a lot of dependencies – harder to test. Over time, a complete lack of readability and difficulties in navigation. For example, all 100 services of the whole application can be in one package. A 100% connection between business logic and frameworks, such as Spring. It’s not a huge problem because you rarely change a framework during a project, but it can obfuscate business logic. Here are a couple of different approaches you can use instead of tiers: Hexagonal Ports and Adapters Package by Feature A rich domain model combined with business logic – Object Oriented Programming combined with elements of tactical DDD. Aggregates, Value Objects, etc. The logical division at the level of system architecture is not the end – we often forget about the separation in the applications themselves. Rarely does one website equal one functionality. You should also separate domains and subdomains (there can be many of them) that together form a specific service or application. The principles of communication between them should be similar to those used at the system architecture level: fixed APIs, interface communication, encapsulation, etc. Sale Offer vs. Reality – The Transition to Implementation The offer is usually made by a different person (or team) than the one who takes care of the implementation. Because of this, it’s worth engaging a third party for quick validation of the architecture proposal at the bidding stage. Cross-checking will definitely pay off, regardless of your experience. First of all, it’s worth checking whether the architectural design is not overkill considering the client’s needs. If you start with microservices as your first idea, you can easily choose something way too complicated. For example, you need to design one rather small application (one domain, several functionalities), but you match it with a microservice architecture launched on Kubernetes, with the ELK stack, monitoring, etc. Do you really want to do something like this with one application if you don’t know what the plans for future project development are? You may also encounter the opposite situation – your proposition will be insufficient for the client’s needs. It’ll probably happen less often because, in the offers, everything is usually planned with solid margins – but if it does, you’ll need to find good arguments to support your approach. Also, it’s a good idea to limit yourself to the system architecture at the offer stage and leave the application architecture for the implementation stage. What if we gave up on microservices, and a year later, it turns out that they are needed after all? If you build a monolithic application with appropriate structure and division into domains and sub-domains, changing the architecture shouldn’t require a revolution – it’ll simply be a natural transition you can carry out without too much effort. Technical and Business Limitations Decisions in these areas will strictly depend on the client’s requirements – it is difficult to define the rules of conduct. Some elements to pay attention to: Physical arrangement of machines. Legal aspects of data retention. Availability of technology with a given cloud provider. Heavy algorithmic calculations (servers can be adapted specifically for this). User load on the system. Heavy Traffic Regardless of the type of system architecture you choose, the application should be able to handle the traffic specified by the client and be ready for possible “unexpected” spikes. You have to be careful not to fall into the trap of “premature optimization,” but if you already have specific requirements, you have to adjust the solution to them from the very beginning. Some things to consider: Horizontal and vertical scaling Cache Autoscaling Load balancers Performance/load tests SQL optimization Database replication The best scenario is when the client is able to define specific metrics regarding the performance of the system, such as the number of logins per hour, X visits to a specific website within X time, ten processed lease applications, 100 searches for offers per minute, etc. Based on such metrics, you can prepare appropriate tests and properly adjust the system. Processing Large or Many Files There are two aspects to consider here: 1. Architectural aspect: The separation of “processing” functionalities to independent technical services that can be scaled according to current needs. 2. The technical aspect: In most cases, processing large files comes down to memory problems. One of the methods that can help is stream processing (you don’t load the entire file into memory). Another thing is integration with external file-handling services such as AWS S3. It’s always a good idea to review the manufacturer’s documentation carefully and use the provided solutions. In the case of S3, it’s an API for processing files in parts – multipart uploads and downloads. Consider this at the very beginning, even if you don’t have specific requirements as to the size of the processed files. It doesn’t require too much work, and you will be safe in the future. Summary As you can see, a modular system can serve as a good alternative to microservices or the traditional monolith architecture. Each of the approaches has its benefits and limitations. Which should you consider for your project? This will depend entirely on the nature of the system you’ll be working on. To make this decision a bit easier, consult the following tables that summarize the main advantages and disadvantages. Microservices Advantages Drawbacks Cloud Native Physical separation of modules Horizontal scaling Deployment autonomy Technological autonomy Multiple teams can collaborate on independent or loosely coupled components. Smooth CI – no conflicts in cooperation. Loose coupling Difficult communication analysis (API design and versioning). Complicated infrastructure (logging, messaging, monitoring, etc.) High overhead for network communication. Multiple deployment units High effort at the start of the project. Difficult to test all business functions. CI/CD processes are definitely more complicated. Transactions are often distributed. Monolith Advantages Drawbacks Simple infrastructure Communication speed and reliability A single deployment unit Transactionality Communication security Quick kickoff Testability of all business functions Difficulty maintaining structure (Big Ball of MUD risk). Difficult to maintain – one place where something can go wrong. More difficult horizontal scaling (more resources needed, at least in theory). Difficult transition to distributed architecture. Difficult CI – lots of conflicts and work on common components. Stability – a bug in one software module can affect the entire application. Permanent attachment to one technology. Modular Monolith Advantages Drawbacks Simple infrastructure Communication speed and reliability A single deployment unit Communication security Quick kickoff Easy migration to distributed architecture. Business autonomous modules Modular structure Testability of all business functions An open way to transition from monolith to microservices. Loose coupling Data duplication More difficult to maintain data integrity. Difficult to maintain – one place where something can go wrong. More difficult horizontal scaling (more resources needed, at least in theory). Permanent attachment to one technology. Analyze your needs, talk to your trusted team members, and choose the architecture that’ll serve the project best – not just today, but tomorrow and even many years into the future.

By Arkadiusz Rosloniec

Distributed Tracing Best Practices

Distributed tracing is now a staple in the modern observability stack. With the shift to microservices, we needed a new way to observe how our services interacted. Distributed Tracing provides that view by allowing us to do request tracing - i.e., trace a request across the components in our distributed system. Today, Distributed Tracing is used for identifying performance bottlenecks, debugging issues, and understanding how our systems are interacting in production. However, implementing Distributed Tracing is complex, and how much value teams get from it depends a fair bit on how it is implemented. Implementation mechanics like which components are instrumented, the sampling rate, and the quality of trace visualization all influence the value companies get from tracing, which in turn influences developer adoption. Additionally, this space is continuously evolving, with new tools and techniques emerging all the time. In this article, let us look at best practices for distributed tracing in 2023. What Is Distributed Tracing? Distributed Tracing refers to a mechanism that allows us to track a single request as it traverses multiple services in a distributed environment. Why we need distributed tracing To enable this, distributed tracing tools insert a unique trace context (trace ID) into each request's header and implement mechanisms to ensure that the trace context is propagated throughout the request path. Each network call made in the request's path is captured and represented as a span. A span is a basic unit of a trace - it represents a single event within the trace, and a trace can have one or multiple spans. A span consists of log messages, time-related data, and other attributes to provide information about the operation it tracks. Anatomy of a distributed trace Through its unique view, Distributed Tracing unlocks several new use cases/improve existing use cases. It allows us to understand service interdependencies (for example, who is calling my service), identify performance bottlenecks (which specific DB call is degrading my latency?), quickly identify failure points for debugging (which API is causing this 500 issue?) and also have more granular SLOs. Components of a Distributed Tracing System To implement any distributed tracing system, we install four distinct components: Instrumentation library Collector (pre-processor) Storage back-end Visualization layer Today, there are several options available for each of these components - you could use one single platform that does all four above or piece together your distributed tracing framework by using different solutions for different components. Components of a tracing system Instrumentation Library This is the part that is integrated into each application or service. When an application executes, the instrumentation library ensures that traceIDs are added into each request or that trace context (trace ID) is propagated into the next span. The library sends this data to a collector. Collector The collector is an intermediary between the instrumentation library and the storage back-end. It gathers traces, processes them (e.g., aggregating spans, sampling), and prepares them for storage. Storage back-end The storage back-end persists and indexes trace data. It typically uses a distributed storage system capable of handling large volumes of data and allows for efficient querying and retrieval. Visualization Layer This is the user interface of the distributed tracing system. It allows developers and operators to interact with trace data. This layer provides tools for querying, searching, and filtering trace data based on various criteria. It presents the trace data in a visually meaningful way, often as a trace graph or timeline, allowing users to analyze the sequence of events and identify bottlenecks or issues. Implementing Distributed Tracing Systems Is Complex While there are several benefits, implementing distributed tracing systems (especially well) is not yet an easy, "solved" task. It requires that the implementing team make several decisions, and those decisions meaningfully impact the amount of value the rest of the engineering team gets from tracing. It’s not uncommon for companies to implement distributed tracing and pay half a million dollars annually, only to have the average developer use it only twice a year. See below for some best practices in how to implement tracing well. Best Practices for Distributed Tracing Pick OTel for Instrumentation There are several popular open-source tracing frameworks, such as OpenTelemetry, Jaeger, and Zipkin. Today, in 2023, OTel has become somewhat of an obvious choice for the following reasons: Wide coverage: OTel has instrumentation libraries and SDKs for different programming languages and frameworks and has broad coverage now. See here for what OTel supports. Is vendor-neutral: By now, most vendors support OTel instrumentation. So you could instrument with OTel and push the data to any vendor of your choice. You'd have vendor interoperability and portability over time (should you choose to change vendors). This is a list of observability vendors that natively support OTel data, and here's a registry of libraries and plugins for connecting OTel with other vendors. Maturity and stability: OTel has been maturing for several years, with wide community support. It is now the 2nd largest project in the CNCF ecosystem in terms of contributors, next only to Kubernetes itself. The strong community ensures it continues to evolve and add support to new technologies rapidly Leverage Automatic Instrumentation Where Possible OpenTelemetry provides two ways to instrument code into applications and components - manual instrumentation and automation instrumentation. If you're on Kubernetes and if most of your services are on Java, NodeJS, or Python, leverage automatic instrumentation extensively as it reduces implementation effort. Manual instrumentation The OTel code has to be added to the application by the developer, so this requires a code change. Manual instrumentation allows for more customization in terms of spans and traces. Most languages are covered for manual instrumentation - C++, .NET, Go, Java, Python, etc. Refer here for the latest list. Automatic instrumentation This is a way to instrument applications/ services without making code changes or having to recompile the application. An intelligent agent gets attached to an application, reads its activity, and extracts the traces. This is possible if you are on Kubernetes. OTel today supports automatic instrumentation for Java, NodeJS, Python, etc. (refer here for the latest list). Customization of spans and traces is limited with automatic instrumentation (vs. manual instrumentation) but is sufficient for most use cases. Start With Critical Paths and Expand From There It is impractical to instrument every service/ component in large distributed systems in one go, so it is important to thoughtfully pick out which paths to instrument first and how to expand from there. Some guidelines/ principles to follow here: Go Outside-In/ Start Close to the Users It is often best to begin from the outside and move inward. This means starting at the points where a request enters your application, incoming requests from users, or external clients. By starting at the entry points, it is easier to get a holistic view of how requests flow through the system. Pick the Most Critical Paths in the System and Instrument Them First The general guideline is to identify the most important request paths in your system; these may be the ones that are most frequently accessed or have the most significant impact on overall application performance monitoring. Start by instrumenting these critical paths first so you can demonstrate value to the overall organization and then expand from there. Always Instrument Request Paths End-To-End So a Trace Doesn’t Break Whatever paths you choose, ensure that the path is instrumented end-to-end - which means each service and component in the request path is instrumented to propagate the context (TraceID) and generate spans as required. Any gaps result in incomplete or broken traces, which negate the effort invested to instrument upstream services. Be Intentional About Sampling In 99% of the cases, companies want to sample their traces. This is because if you store every single trace, you might be storing and managing a massive amount of data. Let's take an example. Assume each span is 500 bytes (including tagging and logging). If your application is serving 2000 requests per second and has 20 different services, it ends up generating 20MB of data every second, or 72 GB per hour, or 1 TB each day, for a simple 20-service setup. This is why most companies end up storing a sample of the distributed traces. It is important to select the right sampling strategy so you still get visibility into what you care about while having control over costs. Broadly, there are two categories of sampling: 1. Upfront/ Head-Based Sampling This is a simple way to decide which spans to keep before any spans have been generated for a given request. This is called head-based sampling, as the decision is made at the beginning or “head” of the request. Sometimes, it is referred to as unbiased sampling when decisions are made without even looking at the request. Within head-based sampling, there are several mechanisms commonly in use, like below. Probabilistic or fixed rate sampling: Randomly selecting a subset of traces to keep based on a fixed sampling rate - say 1% Rate-Limiting Sampling: Setting a fixed limit on the number of requests to be traced per unit of time. For instance, if the rate limit is set to 100 requests per minute, only the first 100 requests in that minute will be traced. Priority-Based Sampling: Priority-based sampling assigns different priorities to requests, and the sampling rate is adjusted accordingly. Requests with higher priority (e.g., critical transactions) have a higher rate of sampling, and lower priority requests have a lower rate. 2. Tail-Based Sampling Tail sampling is where the decision to sample takes place based on the responses within the trace, e.g., high latency and errors. This method ensures that "interesting" requests are traced, even when overall sampling rates are low. However, tail-based sampling is much harder to implement (vs other simpler methods), as one would have to store in a buffer all traces until the response comes back. This guide covers tail-based sampling in some depth. Most organizations typically resort to a simple head-based probabilistic sampling mechanism, with a rate of 1-3% sampling. See here for how to configure fixed-rate sampling at OTel. Be Selective in Implementing Custom Tracing Distributed tracing is powerful in that it allows us to report custom tracing spans. Custom spans allow us to enrich distributed traces with additional, domain-specific information, making tracing data more meaningful. It’s possible to capture and log error states as part of a span or create child spans that further describe the functioning of a service. Effectively tagged spans can, in turn, significantly reduce the amount of logging statements required by your code. In the context of tracing, breadth refers to the number of services or components being instrumented, while depth refers to the level of detail captured within each span. Striking the right balance between breadth and depth is crucial in implementing an effective tracing mechanism while also controlling costs. In general, it is a good idea to go as broad as possible and to be selective in where you go deep. Integrate Tracing With Your Monitoring and Logging Systems Make sure to connect tracing with existing monitoring and logging systems to make it easier for developers to correlate across the three datasets while troubleshooting. Typically, this is done through: Log Injection: Inject trace IDs/ span IDs directly into logs using log frameworks or libraries. This way, each log message has a traceID that can be used to easily query specific logs. Metrics Tagging: Trace-related tags or labels can be included when recording metrics. These tags can be traceIDs span names or other trace-specific metadata. This enables developers to filter and aggregate metrics around tracing data and makes it easier to understand distributed systems. Protocols like OpenTelemetry already allow you to do this easily. Pick a Modern Trace Visualization Front-End There's a meaningful difference across solutions in terms of the front end. After collecting tracing data, you need to be able to visualize it. A good tracing visualization will allow you to see the flow of tracing requests through a system and identify performance bottlenecks. However, all tracing solutions do not provide an intuitive and user-friendly way to visualize and analyze this data directly. Some tools excel at the collection and storage of tracing data but have basic visualization (e.g., Jaeger, Zipkin, AWS XRay), while others are more focused on providing insights from tracing data and, as a result, have invested in more sophisticated visualization and analytics (e.g., Honeycomb, Lighstep, Helios). Good visualization tools should offer out-of-the-box dashboards that automatically give you service dependency maps, have Gantt and waterfall trace visualizations, and allow for detailed querying and filtering of traces. This article is a well-rounded PoV on visualization in distributed tracing. Explore Next-Generation Tools That Combine AI and Tracing With OTel maturing rapidly, instrumentation has become pretty standardized. Similarly, storage and querying have also become broadly commoditized across the observability industry over the last few years. Today, there is some differentiation in the visualization and analytics layer, although even that is not meaningful. There is an emerging class of solutions that use AI on distributed tracing data to generate inferences on the causes of issues. These solutions also have the most modern tracing stack and make implementation and management dramatically simpler. For example, solutions like ZeroK allow you to do the following: Install distributed tracing across all your components in one go without any code change; all services, DBs, and queues are covered right away using OTel and eBPF. They eliminate the need for sampling- they process 100% of traces and use AI to automatically identify anomalous/"interesting" ones to store (e.g., error traces, high-latency traces). Append the anomalous traces with additional context (e.g., logs) to aid debugging as required. Apply LLMs to these traces to automatically identify likely causes of your production issues. Invest in Developer Onboarding This is an often overlooked but critical factor that will drive the success of distributed tracing in your organization. Remember that distributed tracing is complex, and it is difficult for new developers to get up to speed on how to use it effectively. It is not at all uncommon for companies to have just a handful of power users using the tracing platform, and that to a couple of times a quarter. Developers need to be taught how to interpret trace data, understand the relationships between different microservices, and troubleshoot problems using distributed tracing tools. They must be guided on best practices such as consistent naming conventions, proper instrumentation, and understanding trace context propagation. Planning developer onboarding for distributed tracing is a strategic investment. It not only accelerates the integration of tracing within the system but fosters a culture where developers are active participants in the continuous improvement of system visibility, reliability, and performance. Conclusion We looked at distributed tracing best practices and what you can do to make the journey easier. Distributed tracing is no longer a novelty; it has evolved into a crucial part of the observability stack.

By Rahul Pawar

Unleashing the Power of Microservices With Spring Cloud

The rise of microservices architecture has changed the way developers build and deploy applications. Spring Cloud, a part of the Spring ecosystem, aims to simplify the complexities of developing and managing microservices. In this comprehensive guide, we will explore Spring Cloud and its features and demonstrate its capabilities by building a simple microservices application. What Is Spring Cloud? Spring Cloud is a set of tools and libraries that provide solutions to common patterns and challenges in distributed systems, such as configuration management, service discovery, circuit breakers, and distributed tracing. It builds upon Spring Boot and makes it easy to create scalable, fault-tolerant microservices. Key Features of Spring Cloud Configuration management: Spring Cloud Config provides centralized configuration management for distributed applications. Service discovery: Spring Cloud Netflix Eureka enables service registration and discovery for better load balancing and fault tolerance. Circuit breaker: Spring Cloud Netflix Hystrix helps prevent cascading failures by isolating points of access between services. Distributed tracing: Spring Cloud Sleuth and Zipkin enable tracing requests across multiple services for better observability and debugging. Building a Simple Microservices Application With Spring Cloud In this example, we will create a simple microservices application consisting of two services: a user-service and an order-service. We will also use Spring Cloud Config and Eureka for centralized configuration and service discovery. Prerequisites Ensure that you have the following installed on your machine: Java 8 or later Maven or Gradle An IDE of your choice Dependencies XML  <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-config-server</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-eureka-server</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-config</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> OR Groovy //Gradle implementation 'org.springframework.cloud:spring-cloud-config-server' implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client' implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-server' implementation 'org.springframework.cloud:spring-cloud-starter-config' implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client' implementation 'org.springframework.boot:spring-boot-starter-web' Step 1: Setting up Spring Cloud Config Server Create a new Spring Boot project using Spring Initializr (https://start.spring.io/) and add the Config Server and Eureka Discovery dependencies. Name the project config-server. Add the following properties to your application.yml file: YAML server: port: 8888 spring: application: name: config-server cloud: config: server: git: uri: https://github.com/your-username/config-repo.git # Replace with your Git repository URL eureka: client: serviceUrl: defaultZone: http://localhost:8761/eureka/ Enable the Config Server and Eureka Client by adding the following annotations to your main class: Java import org.springframework.cloud.config.server.EnableConfigServer; import org.springframework.cloud.netflix.eureka.EnableEurekaClient; @EnableConfigServer @EnableEurekaClient @SpringBootApplication public class ConfigServerApplication { public static void main(String[] args) { SpringApplication.run(ConfigServerApplication.class, args); } } Step 2: Setting up Spring Cloud Eureka Server Create a new Spring Boot project using Spring Initializr and add the Eureka Server dependency. Name the project eureka-server. Add the following properties to your application.yml file: YAML server: port: 8761 spring: application: name: eureka-server eureka: client: registerWithEureka: false fetchRegistry: false Enable the Eureka Server by adding the following annotation to your main class: Java import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer; @EnableEurekaServer @SpringBootApplication public class EurekaServerApplication { public static void main(String[] args) { SpringApplication.run(EurekaServerApplication.class, args); } } Step 3: Creating the User Service Create a new Spring Boot project using Spring Initializr and add the Config Client, Eureka Discovery, and Web dependencies. Name the project user-service. Add the following properties to your bootstrap.yml file: YAML spring: application: name: user-service cloud: config: uri: http://localhost:8888 eureka: client: serviceUrl: defaultZone: http://localhost:8761/eureka/ Create a simple REST controller for the User Service: Java import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.PathVariable; import org.springframework.web.bind.annotation.RestController; @RestController public class UserController { @GetMapping("/users/{id}") public String getUser(@PathVariable("id") String id) { return "User with ID: " + id; } } Step 4: Creating the Order Service Create a new Spring Boot project using Spring Initializr and add the Config Client, Eureka Discovery, and Web dependencies. Name the project order-service. Add the following properties to your bootstrap.yml file: YAML spring: application: name: order-service cloud: config: uri: http://localhost:8888 eureka: client: serviceUrl: defaultZone: http://localhost:8761/eureka/ Create a simple REST controller for the Order Service: Java import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.PathVariable; import org.springframework.web.bind.annotation.RestController; @RestController public class OrderController { @GetMapping("/orders/{id}") public String getOrder(@PathVariable("id") String id) { return "Order with ID: " + id; } } Step 5: Running the Application Start the config-server, eureka-server, user-service, and order-service applications in the following order. Once all services are running, you can access the Eureka dashboard at http://localhost:8761 and see the registered services. You can now access the User Service at http://localhost:<user-service-port>/users/1 and the Order Service at http://localhost:<order-service-port>/orders/1. Conclusion In this comprehensive guide, we explored Spring Cloud and its features and demonstrated its capabilities by building a simple microservices application. By leveraging the power of Spring Cloud, you can simplify the development and management of your microservices, making them more resilient, scalable, and easier to maintain. Embrace the world of microservices with Spring Cloud and elevate your applications to new heights.

By Arun Pandey

Microservices

DZone's Featured Microservices Resources

Top Microservices Experts

The Latest Microservices Topics