Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Building AI Applications With Java and Gradle
A Guide to Data-Driven Design and Architecture
To ensure traffic safety in railway transport, non-destructive inspection of rails is regularly carried out using various approaches and methods. One of the main approaches to determining the operational condition of railway rails is ultrasonic non-destructive testing [1]. Currently, the search for images of rail defects using the received flaw patterns is performed by a human. The successful development of algorithms for searching and classifying data makes it possible to propose the use of machine learning methods to identify rail defects and reduce the workload on humans by creating expert systems. The complexity of creating such systems is described in [1, 3-6, 22] and is due, on the one hand, to the variety of graphic images obtained during multi-channel ultrasonic inspection of rails, and on the other hand, to the small number of data copies with defects (not balanced). One of the possible ways to create expert systems in this area is an approach based on the decomposition of the complex task of analyzing the entire multichannel defectogram into individual channels or their sets, characterizing individual types of defects. One of the most common rail defects is a radial bolt hole crack, referred to in the literature as “Star Crack” (Fig. 1). This type of defect is mainly detected by a flaw detector channel with a preferred central inclined angle of ultrasound input in the range of 380- 450 [1-2]. Despite the systematic introduction of continuous track on the railway network, the diagnosis of bolt holes is an important task [1-2], which was the reason for its identification and consideration in this work. Figure 1 – Example of a radial crack in a rail bolt hole Purpose of the work: to compare the effectiveness of various machine learning models in solving the problem of classifying the states of bolt holes of railway rails during their ultrasonic examination. The set purpose is achieved by solving the following problems: Data set generation and preparation Exploratory data analysis Selection of protocols and metrics for evaluating the operation of algorithms Selection and synthesis of implementations of classification models Evaluating the effectiveness of models on test data Data Set Generation and Preparation When a flaw detector equipped with piezoelectric transducers (PZT) travels along a railway track, ultrasonic pulses are emitted into the rail within a specified period. At the same time, receiving the PZT register reflected waves. The detection of defects by the ultrasonic method is based on the principle of reflecting waves from metal inhomogeneities, since cracks, including other inhomogeneities, differ in their acoustic resistance from the rest of the metal [1-5]. During ultrasonic scanning of rails, their structural elements and defects have acoustic responses, which are displayed on the defectogram in the form of characteristic graphic images (scans). Figure 2 shows examples of defectograms in the form of B-scan (“Bright-scan”) of sections of rails with a bolted connection, which were obtained by measuring systems of various types at an inclined angle of ultrasound input. Figure 2 - Examples of individual flaw detection channels (B-scan) when scanning bolt holes of rails by flaw detectors of various companies Individual scans of bolt holes (frames) can be separated from such flaw patterns, for example, by applying amplitude criteria (Fig. 3). Figure 3 - Selection of frames with alarms of bolt holes from B-scan Width W and length L of each frame are the same (Fig. 3), and are selected based on the maximum possible dimensions of bolt holes signaling and their flaws. Each such frame (instance) represents a part of the B-scan and therefore contains data on the coordinates, measurements, and sizes of each point, data from each of the two ultrasonic signal inputs on the rail (+/- 400). In work [3] such data frames are converted into a matrix with shape (60, 75), size 60 * 75 = 4500 elements in grayscale, classification network based on deep learning methods is built and successfully trained. However, the work does not consider alternative and less capacious options for data frame formats and does not show the capabilities of basic methods and machine learning models, so this work is intended to fill this drawback. Various forms of radial cracks of rail bolt holes, their locations, and reflective properties of the surface lead to changing graphic images and, together with a defect-free state, generate a data set with the ability to distinguish 7 classes. In binary classification practice, it is common to assign class “1” to the rarer outcomes or conditions of interest, and class “0” to the common condition. In relation to the identification of defects, we will define a common and often encountered in practice defect-free state - class “0”, and defective states “1”-“6”. Each defect class is displayed on the flaw pattern as a characteristic image that is visible to experts during data decryption (Fig. 4). Although the presence or absence of a flaw (binary classification) is crucial during the operation of the railway track, we will consider the possibilities of classification algorithms and quantify which types of defects or flaws are more likely to be falsely classified as defect-free, which is a dangerous case in the diagnosis of rails. Therefore, the classification problem is reduced in this work to an unambiguously multiclass classification. Figure 4 - Examples of B-scan frames (60, 75) with characteristic images of bolt holes with different radial cracks assigned to one of the 7 classes Each instance of a class can be represented as a basic structure - rectangular data. To equalize the size of the instances, set the length of the table format to k = 60 records (30% more than the maximum possible), fill empty cells with zero values (Fig. 5a). Then the original instance can have the form (6, 60) or be reduced to the form of a flat array and described in 6*60=360 dimensional space (Fig. 5c), and on the B-scan graph it will look like Fig. 5b. Figure 5 – Representation of a rectangular data instance Selecting an Assessment Protocol Collecting and annotating data from ultrasonic testing of rails is associated with significant difficulties, which are described in [3], so we will use a synthesized data set obtained using mathematical modeling. The essence of this approach is reflected in Fig. 6, and its applicability is shown in [3]. The term “synthesized data” is widely discussed when creating real-world visual objects, for example, on the nVIDEA blog [23]. This work extends the application of synthesized data into the field of non-destructive testing. Figure 6 - Application of ML model A sufficiently large number of data instances obtained on the basis of mathematical modeling allows us to avoid the problem of a rare class and choose a protocol for evaluating models in the form of separate balanced sets: training, verification, and testing. Let's limit the data sets: training data = 50,000, test data = 50,000, and validation data = 10,000 instances. Choosing a Measure of Success The absence of a difference in the relative sizes of classes (class balance) allows us to choose the accuracy indicator as a measure of success when training algorithms as a value equal to the ratio of the number of correctly classified instances to their total number. A single metric cannot evaluate all aspects of model applicability for a situation, so the model test step uses a confusion matrix, precision, and completeness measures for each class classifier. Exploratory Data Analysis Information about the balance of classes in the training, test, and validation set is presented in Fig. 7. Figure 7 - Data Quantity Summary The representation of the distribution of normalized alarm depths and their coordinates for the positive measuring channel Ch + 400 and classes 0, 2, 3, and 6 is shown in Fig. 8. Distributions for Ch-400 and channels 0, 1, 4, and 5 have a symmetrical pattern. Figure. 8 – Distribution of normalized values of coordinates and depths of data from the Ch+400 measurement channel for classes 0, 2, 3, 6 The Principal Component Analysis (PCA) method was used as an Exploratory Data Analysis and determination of data redundancy, the two-dimensional representation of which can be represented in the form of Fig. 9. The central class is class 0, from which classes 2, 3, 6 and 1, 4, 5 are located on opposite sides, which corresponds to their graphical display on B-scan. Figure 9 - Visualization of the two-dimensional representation of the PCA method In general, a two-dimensional representation of classes has weak clustering, which indicates the need to use a higher dimension of data to classify them in the limit to the original flat size 6 * 60 = 360. The graph of the integral explainable variance as a function of the number of components of the PCA method (Figure 10a) shows that with 80 components, 98% of the variance is already explained, which indicates a high level of redundancy in the original data. This can be explained by the sparseness of the data, which shows the independence of the obtained 80 components of the PCA method from zero values (Fig. 10b). Figure 10 – PCA: a) integral explained variance as a function of the number of components of the PCA method, b) contributions of predictive variables of a flat data array in projections on the PCA axes Let's consider the assessment of the occupancy of data instances with non-zero values for each class (Fig. 11). Fig. 11 - Assessment of occupancy with non-zero values of class instances Note: Similarity of ranges and quartiles of classes Class 0 has the lowest median as the defect-free condition of the bolt hole is devoid of additional crack alarms. Classes 5 and 6 have the highest median values, indicating high data filling due to the presence of alarms from the lower and upper radial crack of the bolt hole. Classes 1-4 have similar median values, indicating that they are filled with data due to the presence of alarms only from the upper or lower radial crack of the bolt hole. Classes 1 and 2, 3 and 4, 5 and 6, respectively, have similar medians and distributions, due to the symmetry of the data relative to the center of the bolt hole. The level of 80 components of the PCA method is lower than the median for classes 1-6, but is sufficient to describe 98% of the variances, which may indicate redundancy caused not only by zero values in the data. A possible explanation could be the fact that the amplitude values of the alarms do not change much in each class and have a weak effect on the variance. This fact is confirmed by the practice of searching for defects, in which flaw detectors do not often use the amplitude parameter. To assess the complexity of the upcoming Exploratory Data Analysis task, the multidimensional structure of the input data was studied using Manifold learning techniques (Manifold learning): Random projection embedding Isometric Mapping (Isomap) Standard Locally linear embedding (LLE) Modified Locally linear embedding (LLE) Local Tangent Space Alignment embedding (LTSA) Multidimensional scaling (MDS) Spectral embedding t-distributed Stochastic Neighbor Embedding (t-SNE) Also, techniques that can be used for controlled dimensionality reduction and allowing data to be projected into a lower dimension: Truncated SVD embedding Random Trees embedding Neighborhood Components Analysis (NCA) Linear Discriminant Analysis (LDA) The results of the algorithms for embedding data from 3,000 samples of the original shape (6, 60) into two-dimensional space are presented in Fig. 12. Figure 12 – Embedding data into two-dimensional space using various techniques (the color of the dots represents the class) For Manifold learning methods, the data in the graphs is poorly spaced in parametric space, which characterizes the predictable complexity of data classification by simple supervised algorithms. Note also that the method of controlled dimensionality reduction Linear Discriminant Analysis shows good data grouping and can be a candidate for a classification model. Development of Data Classification Models Basic Model The prediction accuracy of each class out of seven possible with a random classifier is 1 / 7 = 0.143 and is the starting point for assessing the statistical power (quality) of future models. As a base model, we will choose Gaussian Naive Bayes, which is often used in such cases. A code fragment for fitting a model on training data and its prediction on test data: Python from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score from sklearn.metrics import confusion_matrix batch = 50000 train_gen = Gen_2D_Orig_Arr(train_dir, batch_size=batch, num_classes=7) Xtrain, ytrain = next(train_gen) # Xtrain.shape = (50 000, 6, 60, 1), # ytrain.shape = (50 000, 7) Xtrain = np.reshape(Xtrain, (batch, Xtrain.shape[1] * Xtrain.shape[2])) # Xtrain.shape = (50 000, 360) ytrain = np.argmax(ytrain, axis=1) # ytrain.shape = (50 000,) test_gen = Gen_2D_Orig_Arr(test_dir, batch_size=batch, num_classes=7) Xtest, ytest = next(test_gen) Xtest = np.reshape(Xtest, (batch, Xtest.shape[1] * Xtest.shape[2])) ytest = np.argmax(ytest, axis=1) # ytest.shape = (50 000,) model = GaussianNB() model.fit(Xtrain, ytrain) start_time = time() y_model = model.predict(Xtest) # y_model.shape = (50 000,) timing = time() - start_time acc = accuracy_score(ytest, y_model) print('acc = ', acc) print(classification_report(ytest, y_model, digits=4)) mat = confusion_matrix(ytest, y_model) # mat.shape = (7, 7) fig = plt.figure(figsize=(10, 6), facecolor='1') ax = plt.axes() sns.heatmap(mat, square=True, annot=True, cbar=False, fmt="d", linewidths=0.5) plt.xlabel('сlass qualifiers') plt.ylabel('true value'); print(f'{timing:.3f} s - Predict time GaussianNB') The resulting difference matrix and summary report on model quality are presented in Fig. 13 a and b. The trained model has statistical power, as it has an overall accuracy of 0.5819, which is 4 times higher than the accuracy of a random classifier. Despite the rather low accuracy of the model, we will consider the specific relationship between the qualitative indicators of its work and the graphical representation of the projected data using the Linear Discriminant Analysis method (Fig. 13c). Figure 13 - Summary Quality Assessments of the Gaussian Naïve Bayesian Model: a) dissimilarity matrix showing model misclassification rates; b) a report on the qualitative indicators of the model’s performance in the form of various accuracy metrics; c) projection of data into two-dimensional space using the Linear Discriminant Analysis method The projected data of class 6 are the most distant from most points of other classes (Fig. 13c), which is reflected in the high precision of its classifier equal to 0.9888, however, the closeness of the representation of class 3 reduced the recall of classifier 6 to 0.5688 due to false negative predictions, which are expressed by the error rate equal to 2164 in the dissimilarity matrix. The projection of class 5 is also removed, which is reflected in the high precision of its classifier equal to 0.9916, however, it has intersections with classes 1 and 4, which affected the completeness of the classifier equal to 0.4163, due to erroneous predictions with frequencies of 2726 and 1268 for classifiers 1 and 4, respectively. The projection of class 1 has intersections with classes 5, 4, and 0, while, accordingly, classifier 1 has false positives with a frequency of 2726 for class 5 and false negatives with frequencies of 2035 and 3550 in favor of classifiers 0 and 4. Similar relationships are observed for other classes. One of the interesting ones is the behavior of classifier 0. Defect-free class 0 is in the middle of the projections, which corresponds to its graphic image, which is closest to classes 1, 2, 3, and 4 and most distinguishable from classes 5 and 6 (Fig. 4). Classifier 0 recognizes its data class well, which causes the highest recall score of 0.9928, but has numerous false positives in classes 1, 2, 3, 4 with a precision of 0.4224, that is, classes with defects are often classified as a defect-free class (class 0 ), which makes the Gaussian Naive Bayes model completely unsuitable for flaw detection purposes. The resulting Gaussian Naive Bayes classifier is simple enough to describe complex data structure. Linear Discriminant Analysis (LDA) Classifier Model Preliminary analysis based on data dimensionality reduction showed a good grouping of classes within the Linear Discriminant Analysis method (Fig. 12), which served as a motivation for its use as one of the next models: Python from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda_model = LinearDiscriminantAnalysis(solver='svd', store_covariance=True) lda_model.fit(Xtrain, ytrain) y_model = lda_model.predict(Xtest) acc = accuracy_score(ytest, y_model) print(classification_report(ytest, y_model, digits=4)) mat = confusion_matrix(ytest, y_model) fig = plt.figure(figsize=(10, 5), facecolor='1') sns.heatmap(mat, square=True, annot=True, cbar=False, fmt='d', linewidths=0.5) plt.xlabel('сlass qualifiers') plt.ylabel('true value'); The results of its training and prediction are presented in Fig. 14. The overall accuracy of the model was 0.9162, which is 1.57 times better than the accuracy of the basic model. However, classifier 0 has a large number of false positives for classes 2 and 4 and its precision is only 0.8027, which is also not a satisfactory indicator for the purposes of its practical application. Figure 14 – Summary assessments of the quality of work of the Linear Discriminant Analysis (LDA) classifier The hypothesis about the possible lack of training data set to increase the accuracy of the LDA model is not confirmed, since the constructed “learning curve” presented in Fig. 15 shows a high convergence of the training and testing accuracy dependencies at the level of 0.92 with a training set size of 5000 – 6000 items: Python from sklearn.model_selection import learning_curve gen = Gen_2D_Orig_Arr(train_dir, batch_size=8000, num_classes=7) x1, y1 = next(gen) # x1.shape = (8000, 6, 60, 1), y1.shape = (8000, 7) x1 = np.reshape(x1, (batch, x1.shape[1] * x1.shape[2])) # x1.shape = (8000, 360) y1 = np.argmax(y1, axis=1) # y1.shape = (8000,) N, train_sc, val_sc = learning_curve(lda_model, x1, y1, cv=10, train_sizes=np.linspace(0.04, 1, 20)) rmse_tr = (np.std(train_sc, axis=1)) rmse_vl = (np.std(train_sc, axis=1)) fig, ax = plt.subplots(figsize=(10, 6)) ax.plot(N, np.mean(train_sc, 1), '-b', marker='o', label='training') ax.plot(N, np.mean(val_sc, 1), '-r', marker='o', label='validation') ax.hlines(np.mean([train_sc[-1], val_sc[-1]]), 0, N[-1], color='gray', linestyle='dashed') ax.fill_between(N, np.mean(train_sc, 1) – 3 * rmse_tr, np.mean(train_sc, 1) + 3 * rmse_tr, color='blue', alpha=0.2) ax.fill_between(N, np.mean(val_sc, 1) – 3 * rmse_vl, np.mean(val_sc, 1)+ 3 * rmse_vl, color='red', alpha=0.2) ax.legend(loc='best'); ax.set_xlabel('training size') ax.set_ylabel('score') ax.set_xlim(0, 6500) ax.set_ylim(0.7, 1.02); Figure 15: High convergence of the training and testing accuracy dependencies at the level of 0.92 with a training set size of 5000 – 6000 items When trying to create such a classification system, manufacturers of flaw detection equipment are faced with the difficulty of assessing the dependence of the predictive accuracy of the system on the number of items in the data set that need to be obtained in the process of rail diagnostics. The resulting learning curve based on model data allows in this case to estimate this number in the range of 5,000 – 6,000 copies, to achieve an accuracy of 0.92 within the framework of the LDA algorithm. The decreasing dependence of the learning curve on the training data (blue color in Fig. 15) of the LDA classifier shows that it is simple enough for a complex data structure and there is a justified need to find a more complex model to improve prediction accuracy. Dense Network One of the options for increasing forecast accuracy is the use of fully connected networks. The option of such a structure with the found optimal parameters using the tool Keras_tuner shown in Figure 16a, the accuracy of the model increased compared to the previous method to 0.974 (Figure 16b), and the precision of the zero class classifier to 0.912. Fig. 16 – Model structure and accuracy indicators The progressive movement to increase the accuracy of the prediction due to the use of more complex (computationally costly) machine learning algorithms shows the justifiability of actions to create increasingly complex models. Support Vector Machine (SVM) The use of a support vector algorithm with a core as a radial basis function and over-the-grid model hyperparameters using the GridSearchCV class from the scikitlearn machine learning library made it possible to obtain a model with improved quality parameters Fig. 17. Fig. 17 - Summary of Classifier Performance Estimates Based on Support Vector Method (SVM) The use of the SVM method increased both the overall prediction accuracy to 0.9793 and the precision of the null classifier to 0.9447. However, the average running time of the algorithm on a test data set of 50,000 instances with the initial dimension of each 360 was 9.2 s and is the maximum for the considered classifiers. Reducing the model's operating time through the use of pipelines in the form of techniques for reducing the dimensionality of the source data and the SVM algorithm did not allow for maintaining the achieved accuracy. Random Forest Classifier The RandomForestClassifier classifier based on an ensemble of random trees implemented in the scikitlearn package is one of the candidates for increasing the classification accuracy of the data under consideration: Python from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=50) The performance estimates of the random forest algorithm of 50 trees on the test set are shown in Fig. 18. The algorithm made it possible to increase both the overall prediction accuracy to 0.9991 and an important indicator in the form of precision of the null classifier to 0.9968. The zero class classifier makes the most mistakes in classes 1-4 that are similar in graphical representation. The precision of classifiers of classes 1-4 is high and is reduced due to errors in favor of classes 5 and 6, which is not critical in identifying flaws. The average running time of the algorithm according to the prediction of the test data set on the CPU was 0.7 s, which is 13 times less than the running time of SVM with an increase in accuracy of 0.02%. Fig. 18 – Summary assessments of the quality of the classifier based on the random trees method The learning curve of the RandomForestClassifier classifier is presented in Fig. 19 shows a high level of optimality of the constructed model: With increasing training data, the efficiency of the model does not decrease, which indicates the absence of the undertraining effect. The efficiency assessment at the training and testing stage converges and has high values with a difference of no more than 0.028, which may indicate that the model is not overtrained. Fig. 19 – Learning curve of the RandomForestClassifier The resulting learning curve allows us to estimate the minimum required number of samples of each class to achieve acceptable accuracy at the level of 0.98: 1550 data copies, that is, 1550 / 7 = 220 samples for each of the 7 classes. The high accuracy and speed of the random forest algorithm allow you to assess the magnitude of the influence (importance) of all 360 predictive variables on the overall accuracy of the model. The evaluation was carried out by obtaining the average decreasing accuracy of the model by randomly mixing one of the variables, which had the effect of removing its predictive power. Fig. 20 shows the result of a code fragment for assessing the importance of variables on the accuracy of the model: Python rf = RandomForestClassifier(n_estimators=10) gen = Gen_2D_Orig_Arr(train_dir, batch_size=8000) x1, y1 = next(gen) # x1.shape = (8000, 6, 60, 1), y1.shape = (8000, 7) x1 = np.reshape(x1, (batch, x1.shape[1] * x1.shape[2])) # x1.shape = (8000, 360) y1 = np.argmax(y1, axis=1) # y1.shape = (8000,) Xtrain, Xtest, ytrain, ytest = train_test_split(x1, y1, test_size=0.5) rf.fit(Xtrain, ytrain) acc = accuracy_score(ytest, rf.predict(Xtest)) print(acc) scores = np.array([], dtype=float) for _ in range(50): train_X, valid_X, train_y, valid_y = train_test_split(x1, y1, test_size=0.5) rf.fit(train_X, train_y) acc = accuracy_score(valid_y, rf.predict(valid_X)) for column in range(x1.shape[1]): X_t = valid_X.copy() X_t[:, column] = np.random.permutation(X_t[:, column]) shuff_acc = accuracy_score(valid_y, rf.predict(X_t)) scores = np.append(scores, ((acc - shuff_acc) / acc)) scores = np.reshape(scores, (50, 360)) sc = scores.mean(axis=0) fig, ax = plt.subplots(figsize=(10, 4)) ax.plot(sc, '.-r') ax.set_xlabel('Predictive variables') ax.set_ylabel('Importance') ax.set_xlim([0, 360]) ax.xaxis.set_major_locator(plt.MaxNLocator(6)) Figure 20 - Assessment of the importance of predictive variables by RandomForestClassifier-based model accuracy The graph in Fig. 20 shows that the most important predictive variables are from 60 to 85 for channel +40 and from 240 to 265 for channel -40, which determine the depth of the alarm. The presence of peaks at the beginning and end of each range indicates even greater predictive importance of the depths of the beginning and end of alarms. The total number of important variables can be estimated at 50. The importance of variables determining the coordinates and amplitude of the alarm in each data instance is much lower. This assessment is consistent with the assumptions made during the exploration analysis. Training RandomForestClassifier on the entire training data set without amplitudes showed an overall accuracy of 0.9990, without amplitudes and coordinates - 0.9993. Excluding from consideration such parameters as amplitude and coordinate for each data instance reduces the size of the data under consideration to (2, 60) = 120 predictive variables without reducing accuracy. The obtained result allows us to use only the alarm depth parameter for the purpose of data classification. The accuracy achieved by RandomForestClassifier is sufficient and solves the problem of classifying defects in bolt holes. However, for the purpose of generalizing the capabilities, let’s consider a class of deep learning models based on a convolutional neural network. Deep Learning (DL) Model Synthesis and training of a convolutional network requires an iterative process with the search for the best structures and optimization of their hyperparameters. Fig. 21 shows the final version of a simple network structure in the form of a linear stack of layers and the process of its training. Figure 21 – Model structure and report on its training process The results of training and forecasting of the convolutional neural network are presented in Fig. 21. The overall accuracy of the model on test data was 0.9985, which is 1.71 times better than the accuracy of the base model. The number of false positives of classifier 0 is 2+24+6+2=34 out of all 42893 defective instances (Fig. 22a). The average time for predicting test data on the CPU was 4.55 s. Figure 22 – Summary assessments of the quality of work of a trained network based on CNN on test data One of the important tasks of the resulting classifier in its practical use will be the accurate determination of the defect-free class (class 0), which will eliminate the false classification of defective samples as non-defective. It is possible to reduce the number of false positives for a defect-free class by changing the probability threshold. To estimate the applicable threshold cut-off level, a binarization of the multi-class problem was carried out with the selection of a defect-free state and all defective states, which corresponds to the “One vs Rest” strategy. By default, the threshold value for binary classification is set to 0.5 (50%). With this approach, the binary classifier has the quality indicators shown in Fig. 23. Fig. 23 – Qualitative indicators of a binary classifier at a cutoff threshold of 0.5 The resulting precision for the “No defect” class was 0.9952, the same as for the multiclass classifier for class “0”. The use of the sklearn.metrics.precision_recall_curve function allows you to reflect changes in the precision and completeness of a binary classifier depending on a changing cutoff threshold (Fig. 24a). At a cutoff threshold of 0.5, the value of false positives is 34 samples (Fig. 24b). The maximum level of precision and completeness of the classifier is achieved at the point of intersection of their graphs, which corresponds to a cutoff threshold of 0.66. At this point, the classifier reduces the number of false positives for the “No defect” class to level 27 (Fig. 24b). Increasing the threshold to the level of 0.94 allows you to reduce false positives to a value of 8, due to an increase in false negatives to 155 samples (Fig. 24b) (decreasing the completeness of the classifier). A further increase in the cutoff threshold significantly reduces the completeness of the classifier to an unacceptable level (Fig. 24a). Figure 24 – Effect of the cut-off threshold: a) graph of changes in precision and completeness depending on the change in the cut-off threshold value (precision-recall curve); b) dissimilarity matrices at different cutoff thresholds With a set cutoff threshold of 0.94, the qualitative assessments of the classifier are shown in Fig. 25. Precision for the “No defect” class increased to 0.9989. Fig. 25 – Qualitative indicators of a binary classifier with a cutoff threshold of 0.94 Eight false-positive classified data samples with characteristic graphical signs of defects highlighted are shown in Fig. 26. Fig. 26 - Eight false-positive classified samples Of the above graphic images, the controversial ones are the samples marked “controversial”, which indicate the presence of a very short radial crack, which is difficult to classify as a defect. The remaining 6 samples are the classifier error. Note the qualitative indicator in the form of the absence of false classification of samples with defects in the form of a significant length of the radial crack. Such samples are most easily classified during manual analysis by flaw detectors. Further increase in model accuracy is possible through the use of ensembles of the resulting models: DL and RandomForestClassifier. The considered models can be added to the ensemble, but obtained with a different input data format, including the direct Bscan format, as shown in [3]. Conclusions and Discussion The main quality indicators of the developed models for classifying defects in bolt holes are summarized in the diagram in Fig. 27. The gradual and reasonable complication of classification models is reflected in the diagram in the form of an increase in both the overall accuracy of the models (blue color) and an important indicator in the form of class 0 precision (orange color). Maximum accuracy rates above 0.99 were achieved by models based on random forest and convolutional neural networks. At the same time, the random forest model has the advantage of less time spent on prediction. Fig. 27 – Highlighted quality indicators of the considered classification models In This Work The possibility of searching for defects on an ultrasonic flaw detector is shown by decomposing it into separate channels with data and allocating individual diagnostic sites. An assessment is made of the influence of predictive variables in the form of amplitude and coordinates on the quality of classification. An estimate is given of the required amount of data set to build a classification model of defects in bolt holes with an accuracy of 98%, which can serve as a guide for manufacturers of flaw detection equipment when creating automatic expert systems. The possibility of achieving high accuracy rates for classifying the states of rail bolt holes based on classical machine learning algorithms is shown. Qualitative assessments of the operation of the deep learning model are obtained and show the possibility and feasibility of using a convolutional neural network architecture for the synthesis of segmentation networks for searching for defects in continuous flaw patterns (B-scan). References [1] Markov AA, Kuznetsova EA. Rail flaw detection. Formation and analysis of signals. Book 2. Decoding of defectograms. Saint Petersburg: Ultra Print; 2014. [2] Markov AA, Mosyagin VV, Shilov MN, Fedorenko DV. AVICON-11: New Flaw-Detector for One Hundred Percent Inspection of Rails. NDT World Review. 2006; 2 (32): 75-78. Available from: http://www.radioavionica.ru/activities/sistemy-nerazrushayushchego-kontrolya/articles/files/razrab/33.zip [Accessed 14th March 2023]. [3] Kaliuzhnyi A. Application of Model Data for Training the Classifier of Defects in Rail Bolt Holes in Ultrasonic Diagnostics. Artificial Intelligence Evolution [Internet]. 2023 Apr. 14 [cited 2023 Jul. 28];4(1):55-69. DOI: https://doi.org/10.37256/aie.4120232339 [4] Kuzmin EV, Gorbunov OE, Plotnikov PO, Tyukin VA, Bashkin VA. Application of Neural Networks for Recognizing Rail Structural Elements in Magnetic and Eddy Current Defectograms. Modeling and Analysis of Information Systems. 2018; 25(6): 667-679. Available from: doi:10.18255/1818-1015-2018-6-667-679 [5] Bettayeb F, Benbartaoui H, Raouraou B. The reliability of the ultrasonic characterization of welds by the artificial neural network. 17th World Conference on Nondestructive Testing; 2008; Shanghai, China. [Accessed 14th March 2023] [6] Young-Jin C, Wooram C, Oral B. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Computer-Aided Civil and Infrastructure Engineering. 2017; 32(5): 361-378. Available from: doi: 10.1111/mice.12263 [7] Heckel T, Kreutzbruck M, Rühe S, High-Speed Non-Destructive Rail Testing with Advanced Ultrasound and Eddy-Current Testing Techniques. 5th International workshop of NDT experts - NDT in progress 2009 (Proceeding). 2009; 5: 101-109. [Accessed 14th March 2023]. [8] Papaelias M, Kerkyras S, Papaelias F, Graham K. The future of rail inspection technology and the INTERAIL FP7 project. 51st Annual Conference of the British Institute of Non-Destructive Testing 2012, NDT 2012. 2012 [Accessed 14th March 2023]. [9] Rizzo P, Coccia S, Bartoli I, Fateh M. Non-contact ultrasonic inspection of rails and signal processing for automatic defect detection and classification. Insight. 2005; 47 (6): 346-353. Available from: doi: 10.1784/insi.47.6.346.66449 [10] Nakhaee MC, Hiemstra D, Stoelinga M, van Noort M. The Recent Applications of Machine Learning in Rail Track Maintenance: A Survey. In: Collart-Dutilleul S, Lecomte T, Romanovsky A. (eds.) Reliability, Safety, and Security of Railway Systems. Modelling, Analysis, Verification, and Certification. RSSRail 2019. Lecture Notes in Computer Science(), vol 11495. Springer, Cham; 2019.pp.91-105. Available from: doi: 10.1007/978-3-030-18744-6_6. [11] Jiaxing Y, Shunya I, Nobuyuki T. Computerized Ultrasonic Imaging Inspection: From Shallow to Deep Learning. Sensors. 2018; 18(11): 3820. Available from: doi:10.3390/s18113820 [12] Jiaxing Y, Nobuyuki T. Benchmarking Deep Learning Models for Automatic Ultrasonic Imaging Inspection. IEEE Access. 2021; 9: pp 36986-36994. Available from: doi:10.1109/ACCESS.2021.3062860 [13] Cantero-Chinchilla S, Wilcox PD, Croxford AJ. Deep learning in automated ultrasonic NDE - developments, axioms, and opportunities. Eprint arXiv:2112.06650. 2021. Available from: doi: 10.48550/arXiv.2112.06650 [14] Cantero-Chinchilla S, Wilcox PD, Croxford AJ. A deep learning-based methodology for artifact identification and suppression with application to ultrasonic images. NDT & E International. 202; 126, 102575. Available from: doi: 10.1016/j.ndteint.2021.102575 [15] Chapon A, Pereira D, Toewsb M, Belanger P. Deconvolution of ultrasonic signals using a convolutional neural network. Ultrasonics. Volume 111, 2021; 106312. Available from: doi: 10.1016/j.ultras.2020.106312 [16] Medak D, Posilović L, Subasic M, Budimir M. Automated Defect Detection From Ultrasonic Images Using Deep Learning. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control. 2021; 68(10): 3126 – 3134. Available from: doi: 10.1109/TUFFC.2021.3081750 [17] Virkkunen I, Koskinen T. Augmented Ultrasonic Data for Machine Learning. Journal of Nondestructive Evaluation. 2021; 40: 4. Available from: doi:10.1007/s10921-020-00739-5 [18] Veiga JLBC, Carvalho AA, Silva IC. The use of artificial neural network in the classification of pulse-echo and TOFD ultra-sonic signals. Journal of the Brazilian Society of Mechanical Sciences and Engineering. 2005; 27(4): 394-398 Available from: doi:10.1590/S1678-58782005000400007 [19] Posilovića L, Medaka D, Subašića M, Budimirb M, Lončarića S. Generative adversarial network with object detector discriminator for enhanced defect detection on ultrasonic B-scans. Eprint arXiv:2106.04281v1 [eess.IV]. 2021. Available from: https://arxiv.org/pdf/2106.04281.pdf [20] Markov AA, Mosyagin VV, Keskinov MV. A program for 3D simulation of signals for ultrasonic testing of specimens. Russian Journal of Nondestructive Testing. 2005; 41: 778-789. Available from: doi: 10.1007/s11181-006-0034-3 [21] Shilov M.N., Methodical, algorithmic, and software for registration and analysis of defectograms during ultrasonic testing of rails [dissertation]. [Saint-Petersburg]: Saint-Petersburg State University of Aerospace Instrumentation; 2007. p 153. [22] A. Kaliuzhnyi. Using Machine Learning To Detect Railway Defects. [23] NVIDIA Blog
In today's rapidly evolving technological landscape, Artificial Intelligence (AI) is getting a lot of attention. Everywhere you look on social media, there are new AI startups, prompt engineering tools, and Large Language Model (LLM) solutions. And it's not surprising because AI feels almost like magic! For instance, ChatGPT really got everyone excited. It got 100 million users in just two months after it became publicly available, making it super popular, super fast. Now, everyone's wondering: what does this AI wave mean for me, my work, and my products? More specifically, how does it impact those who are at the forefront of building digital products and applications using APIs? This article explores what the AI trend means for those of us making digital tools using APIs. APIs Make AI Accessible to All Major companies have been quick to establish dedicated AI research labs, recruiting data scientists to craft AI models. But what about smaller entities without the massive computing resources and GPUs for an AI research lab? Are they to just observe as larger firms capitalize on the AI revolution? The answer is no. For many AI applications, particularly those centered around natural language, there's no need for a special AI research lab. Instead, existing public AI models, LLMs, can be utilized. This means that developers don't need to be AI experts; they just need to be proficient with APIs. Through prompt engineering, fine-tuning, and embeddings, these models can be customized to serve specific requirements. The magic word here is "APIs." APIs encapsulate the complexities of their contents, making AI models accessible to all developers, regardless of their expertise in AI. This separation of concerns ensures that while a select few data scientists create AI models and package them as APIs, a larger pool of developers can integrate these models into their applications, crafting "smart" solutions proficient in natural language processing. The outcome is that APIs level the playing field, granting access to powerful AI models for developers of all company sizes. AI and API Patterns for Modern Applications APIs are key in linking your product to everything else. They're great at connecting different software components. When we talk about AI, this connection is even more crucial because AI needs to work with different data sources and tools to be useful. Modern applications consistently leverage both AI and APIs. While AI imparts "smartness" to applications, enabling them to comprehend human language and intent, APIs facilitate data access and system connections. These technologies aren't just parallel entities, and their combined use can be synergistic. There are three primary patterns for their integration: Pattern 1: Call AI Services via API AI models, like OpenAI ChatGPT, are often packaged as APIs. Through these APIs, developers can trigger the AI, sending prompts as input allowing developers to integrate AI into their applications seamlessly. A commonly used architecture for building a new AI app also utilizes two OpenAI API endpoints, like Vector Embeddings and Chat Completion, as you see in the diagram below: This method first creates vector embeddings through OpenAI API for each input document (text, image, CSV, PDF, or other types of structured/unstructured data), then indexes generated embeddings for fast retrieval and saves them into storage like vector databases for fast retrieval, and these documents are presented to ChatGPT along with a user’s question as a prompt. With this added custom knowledge, ChatGPT can respond intelligently to user queries. Pattern 2: AI Services Call APIs The output from an AI model in response to a prompt is typically textual. To translate these "ideas" into actionable outcomes, AI services need to call APIs. These APIs can initiate actions in the real or digital world, such as making payments, booking an appointment, sending messages, or adjusting room temperatures. In essence, APIs act as the hands of an AI service, enabling it to interact with its environment. A good example of it can be ChatGPT custom plugins. This article explains how to build a custom plugin for API Gateway using APISIX. Because APISIX can be in the front of APIs to route AI requests to the intended backend API services. We can implement security measures like authentication, authorization, and rate limiting or cache similar responses from APIs, and it allows us to gather valuable insights about API usage, performance, and potential issues. Pattern 3: AI Connects APIs Years ago, to make two software systems or APIs communicate, the only option was manual coding. software engineers would craft complex and fragile code sequences. This task was solely for developers, and every modification meant more coding, leading to a tangled web of interconnecting codes. With the advent of generative AI, interacting with an Integration Platform as a Service (iPaaS) could be as straightforward as making a request in a chat. If you want data from one platform to synchronize with another, you don't need to understand the technicalities. You just need to specify your requirements. For instance, you might say, "Sync customer lead scores from Marketo to Salesforce." or ask AI to move data from one API to another. The AI will then handle the process, test its compatibility, and fix any issues autonomously. The APIs used in your integrations are always changing, and this sometimes can cause problems. AI can monitor the health of your data integrations and keep fixing the errors or simply send alert notifications in natural language if an entry in an API request or response requires attention. Safeguarding API Usage With AI's capability to call APIs that initiate actions in the real or digital world, it's critical to implement safeguards. This safeguarding, ideally implemented at the API management system level, is required to ensure the responsible and secure use of AI. This post explores how API Gateway can be beneficial for ChatGPT plugin developers to expose, secure, manage, and monitor their API endpoints. In Conclusion APIs provide the perfect building blocks for AI-driven software development. The combination of APIs and AI technologies is vital for developing powerful applications. The three identified patterns—applications integrating AI functionality through APIs and AI services invoking APIs for actions—offer a roadmap for leveraging AI in application development. As the AI landscape continues to evolve, the focus on APIs and the strategies for its integration will become even more important.
Developers craft software that both delights consumers and delivers innovative applications for enterprise users. This craft requires more than just churning out heaps of code; it embodies a process of observing, noticing, interviewing, brainstorming, reading, writing, and rewriting specifications; designing, prototyping, and coding to the specifications; reviewing, refactoring and verifying the software; and a virtuous cycle of deploying, debugging and improving. At every stage of this cycle, developers consume and generate two things: code and text. Code is text, after all. The productivity of the developers is limited by real-world realities, challenges with timelines, unclear requirements, legacy codebase, and more. To overcome these obstacles and still meet the deadlines, developers have long relied on adding new tools to their toolbox. For example, code generation tools such as compilers, UI generators, ORM mappers, API generators, etc. Developers have embraced these tools without reservation, progressively evolving them to offer more intelligent functionalities. Modern compilers do more than just translate; they rewrite and optimize the code automatically. SQL, developed fifty years ago as a declarative language with a set of composable English templates, continues to evolve and improve data access experience and developer productivity. Developers have access to an endless array of tools to expand their toolbox. The Emergence of GenAI GenAI is a new, powerful tool for the developer toolbox. GenAI, short for Generative AI, is a subset of AI capable of taking prompts and then autonomously creating many forms of content — text, code, images, videos, music, and more — that imitate and often mirror the quality of human craftsmanship. Prompts are instructions in the form of expository writing. Better prompts produce better text and code. The seismic surge surrounding GenAI, supported with technologies such as ChatGPT and copilot, positions 2023 to be heralded as the “Year of GenAI.” GenAI’s text generation capability is expected to revolutionize every aspect of developer experience and productivity. Impact on Developers Someone recently noted, “In 2023, natural language has emerged as the fastest programming language.” While the previous generation of tools focused on incremental improvement to productivity for writing code and improving code quality, GenAI tools promise to revolutionize these and every other aspect of developer work. ChatGPT can summarize a long requirement specification, give you the delta of what changed between the two versions, or help you come up with a checklist of a specific task. For coding, the impact is dramatic. Since these models have been trained on the entire internet, billions of parameters, and trillions of tokens, they’ve seen a lot of code. By writing a good prompt, you make it to write a big piece of code, design the APIs, and refactor the code. And in just one sentence, you can ask ChatGPT to rewrite everything in a brand-new language. All these possibilities were simply science fiction just a few years ago. It makes the mundane tasks disappear, hard tasks easier, and difficult tasks possible. Developers are relying more on ChatGPT to explain new concepts and clarify confusing ideas. Apparently, this trend has reduced the traffic to StackOverflow, a popular Q&A site for developers, anywhere between 16% to 50%, on various measures! Developers choose the winning tool. But there’s a catch. More than one, in fact. The GenAI tools of the current generation, although promising, are unaware of your goals and objectives. These tools, developed through training on a vast array of samples, operate by predicting the succeeding token, one at a time, rooted firmly in the patterns they have previously encountered. Their answer is guided and constrained by the prompt. To harness their potential effectively, it becomes imperative to craft detailed, expository-style prompts. This nudges the technology to produce output that is closer to the intended goal, albeit with a style and creativity that is bounded by their training data. They excel in replicating styles they have been exposed to but fall short in inventing unprecedented ones. Multiple companies and groups are busy with training LLMs for specific tasks to improve their content generation. I recommend heeding the advice of Sathya Nadella, Microsoft’s CEO, who suggests it is prudent to treat the content generated by GenAI as a draft, requiring thorough review to ensure its clarity and accuracy. The onus falls on the developer to delineate between routine tasks and those demanding creativity — a discernment that remains beyond GenAI’s reach, at least for now. Despite this, with justifiable evidence, GenAI promises improved developer experience and productivity. OpenAI’s ChatGPT raced to 100 million users in a record time. Your favorite IDEs have plugins to exploit it. Microsoft has promised to use GenAI in all its products, including its revitalized search offering, bing.com. Google has answered with its own suite of services and products; Facebook and others have released multiple models to help developers progress. It’s a great time to be a developer. The revolution has begun promptly. At Couchbase, we’ve introduced generative AI capabilities into our Database as a Service Couchbase Capella to significantly enhance developer productivity and accelerate time to market for modern applications. The new capability called Capella iQ enables developers to write SQL++ and application-level code more quickly by delivering recommended sample code.
This is an article from DZone's 2023 Automated Testing Trend Report.For more: Read the Report Artificial intelligence (AI) has revolutionized the realm of software testing, introducing new possibilities and efficiencies. The demand for faster, more reliable, and efficient testing processes has grown exponentially with the increasing complexity of modern applications. To address these challenges, AI has emerged as a game-changing force, revolutionizing the field of automated software testing. By leveraging AI algorithms, machine learning (ML), and advanced analytics, software testing has undergone a remarkable transformation, enabling organizations to achieve unprecedented levels of speed, accuracy, and coverage in their testing endeavors. This article delves into the profound impact of AI on automated software testing, exploring its capabilities, benefits, and the potential it holds for the future of software quality assurance. An Overview of AI in Testing This introduction aims to shed light on the role of AI in software testing, focusing on key aspects that drive its transformative impact. Figure 1: AI in testing Elastically Scale Functional, Load, and Performance Tests AI-powered testing solutions enable the effortless allocation of testing resources, ensuring optimal utilization and adaptability to varying workloads. This scalability ensures comprehensive testing coverage while maintaining efficiency. AI-Powered Predictive Bots AI-powered predictive bots are a significant advancement in software testing. Bots leverage ML algorithms to analyze historical data, patterns, and trends, enabling them to make informed predictions about potential defects or high-risk areas. By proactively identifying potential issues, predictive bots contribute to more effective and efficient testing processes. Automatic Update of Test Cases With AI algorithms monitoring the application and its changes, test cases can be dynamically updated to reflect modifications in the software. This adaptability reduces the effort required for test maintenance and ensures that the test suite remains relevant and effective over time. AI-Powered Analytics of Test Automation Data By analyzing vast amounts of testing data, AI-powered analytical tools can identify patterns, trends, and anomalies, providing valuable information to enhance testing strategies and optimize testing efforts. This data-driven approach empowers testing teams to make informed decisions and uncover hidden patterns that traditional methods might overlook. Visual Locators Visual locators, a type of AI application in software testing, focus on visual elements such as user interfaces and graphical components. AI algorithms can analyze screenshots and images, enabling accurate identification of and interaction with visual elements during automated testing. This capability enhances the reliability and accuracy of visual testing, ensuring a seamless user experience. Self-Healing Tests AI algorithms continuously monitor test execution, analyzing results and detecting failures or inconsistencies. When issues arise, self-healing mechanisms automatically attempt to resolve the problem, adjusting the test environment or configuration. This intelligent resilience minimizes disruptions and optimizes the overall testing process. What Is AI-Augmented Software Testing? AI-augmented software testing refers to the utilization of AI techniques — such as ML, natural language processing, and data analytics — to enhance and optimize the entire software testing lifecycle. It involves automating test case generation, intelligent test prioritization, anomaly detection, predictive analysis, and adaptive testing, among other tasks. By harnessing the power of AI, organizations can improve test coverage, detect defects more efficiently, reduce manual effort, and ultimately deliver high-quality software with greater speed and accuracy. Benefits of AI-Powered Automated Testing AI-powered software testing offers a plethora of benefits that revolutionize the testing landscape. One significant advantage lies in its codeless nature, thus eliminating the need to memorize intricate syntax. Embracing simplicity, it empowers users to effortlessly create testing processes through intuitive drag-and-drop interfaces. Scalability becomes a reality as the workload can be efficiently distributed among multiple workstations, ensuring efficient utilization of resources. The cost-saving aspect is remarkable as minimal human intervention is required, resulting in substantial reductions in workforce expenses. With tasks executed by intelligent bots, accuracy reaches unprecedented heights, minimizing the risk of human errors. Furthermore, this automated approach amplifies productivity, enabling testers to achieve exceptional output levels. Irrespective of the software type — be it a web-based desktop application or mobile application — the flexibility of AI-powered testing seamlessly adapts to diverse environments, revolutionizing the testing realm altogether. Figure 2: Benefits of AI for test automation Mitigating the Challenges of AI-Powered Automated Testing AI-powered automated testing has revolutionized the software testing landscape, but it is not without its challenges. One of the primary hurdles is the need for high-quality training data. AI algorithms rely heavily on diverse and representative data to perform effectively. Therefore, organizations must invest time and effort in curating comprehensive and relevant datasets that encompass various scenarios, edge cases, and potential failures. Another challenge lies in the interpretability of AI models. Understanding why and how AI algorithms make specific decisions can be critical for gaining trust and ensuring accurate results. Addressing this challenge requires implementing techniques such as explainable AI, model auditing, and transparency. Furthermore, the dynamic nature of software environments poses a challenge in maintaining AI models' relevance and accuracy. Continuous monitoring, retraining, and adaptation of AI models become crucial to keeping pace with evolving software systems. Additionally, ethical considerations, data privacy, and bias mitigation should be diligently addressed to maintain fairness and accountability in AI-powered automated testing. AI models used in testing can sometimes produce false positives (incorrectly flagging a non-defect as a defect) or false negatives (failing to identify an actual defect). Balancing precision and recall of AI models is important to minimize false results. AI models can exhibit biases and may struggle to generalize new or uncommon scenarios. Adequate training and validation of AI models are necessary to mitigate biases and ensure their effectiveness across diverse testing scenarios. Human intervention plays a critical role in designing test suites by leveraging their domain knowledge and insights. They can identify critical test cases, edge cases, and scenarios that require human intuition or creativity, while leveraging AI to handle repetitive or computationally intensive tasks. Continuous improvement would be possible by encouraging a feedback loop between human testers and AI systems. Human experts can provide feedback on the accuracy and relevance of AI-generated test cases or predictions, helping improve the performance and adaptability of AI models. Human testers should play a role in the verification and validation of AI models, ensuring that they align with the intended objectives and requirements. They can evaluate the effectiveness, robustness, and limitations of AI models in specific testing contexts. AI-Driven Testing Approaches AI-driven testing approaches have ushered in a new era in software quality assurance, revolutionizing traditional testing methodologies. By harnessing the power of artificial intelligence, these innovative approaches optimize and enhance various aspects of testing, including test coverage, efficiency, accuracy, and adaptability. This section explores the key AI-driven testing approaches, including differential testing, visual testing, declarative testing, and self-healing automation. These techniques leverage AI algorithms and advanced analytics to elevate the effectiveness and efficiency of software testing, ensuring higher-quality applications that meet the demands of the rapidly evolving digital landscape: Differential testing assesses discrepancies between application versions and builds, categorizes the variances, and utilizes feedback to enhance the classification process through continuous learning. Visual testing utilizes image-based learning and screen comparisons to assess the visual aspects and user experience of an application, thereby ensuring the integrity of its look and feel. Declarative testing expresses the intention of a test using a natural or domain-specific language, allowing the system to autonomously determine the most appropriate approach to execute the test. Self-healing automation automatically rectifies element selection in tests when there are modifications to the user interface (UI), ensuring the continuity of reliable test execution. Key Considerations for Harnessing AI for Software Testing Many contemporary test automation tools infused with AI provide support for open-source test automation frameworks such as Selenium and Appium. AI-powered automated software testing encompasses essential features such as auto-code generation and the integration of exploratory testing techniques. Open-Source AI Tools To Test Software When selecting an open-source testing tool, it is essential to consider several factors. Firstly, it is crucial to verify that the tool is actively maintained and supported. Additionally, it is critical to assess whether the tool aligns with the skill set of the team. Furthermore, it is important to evaluate the features, benefits, and challenges presented by the tool to ensure they are in line with your specific testing requirements and organizational objectives. A few popular open-source options include, but are not limited to: Carina – AI-driven, free forever, scriptless approach to automate functional, performance, visual, and compatibility tests TestProject – Offered the industry's first free Appium AI tools in 2021, expanding upon the AI tools for Selenium that they had previously introduced in 2020 for self-healing technology Cerberus Testing – A low-code and scalable test automation solution that offers a self-healing feature called Erratum and has a forever-free plan Designing Automated Tests With AI and Self-Testing AI has made significant strides in transforming the landscape of automated testing, offering a range of techniques and applications that revolutionize software quality assurance. Some of the prominent techniques and algorithms are provided in the tables below, along with the purposes they serve: KEY TECHNIQUES AND APPLICATIONS OF AI IN AUTOMATED TESTING Key Technique Applications Machine learning Analyze large volumes of testing data, identify patterns, and make predictions for test optimization, anomaly detection, and test case generation Natural language processing Facilitate the creation of intelligent chatbots, voice-based testing interfaces, and natural language test case generation Computer vision Analyze image and visual data in areas such as visual testing, UI testing, and defect detection Reinforcement learning Optimize test execution strategies, generate adaptive test scripts, and dynamically adjust test scenarios based on feedback from the system under test Table 1 KEY ALGORITHMS USED FOR AI-POWERED AUTOMATED TESTING Algorithm Purpose Applications Clustering algorithms Segmentation k-means and hierarchical clustering are used to group similar test cases, identify patterns, and detect anomalies Sequence generation models: recurrent neural networks or transformers Text classification and sequence prediction Trained to generate sequences such as test scripts or sequences of user interactions for log analysis Bayesian networks Dependencies and relationships between variables Test coverage analysis, defect prediction, and risk assessment Convolutional neural networks Image analysis Visual testing Evolutionary algorithms: genetic algorithms Natural selection Optimize test case generation, test suite prioritization, and test execution strategies by applying genetic operators like mutation and crossover on existing test cases to create new variants, which are then evaluated based on fitness criteria Decision trees, random forests, support vector machines, and neural networks Classification Classification of software components Variational autoencoders and generative adversarial networks Generative AI Used to generate new test cases that cover different scenarios or edge cases by test data generation, creating synthetic data that resembles real-world scenarios Table 2 Real-World Examples of AI-Powered Automated Testing AI-powered visual testing platforms perform automated visual validation of web and mobile applications. They use computer vision algorithms to compare screenshots and identify visual discrepancies, enabling efficient visual testing across multiple platforms and devices. NLP and ML are combined to generate test cases from plain English descriptions. They automatically execute these test cases, detect bugs, and provide actionable insights to improve software quality. Self-healing capabilities are also provided by automatically adapting test cases to changes in the application's UI, improving test maintenance efficiency. Quantum AI-Powered Automated Testing: The Road Ahead The future of quantum AI-powered automated software testing holds great potential for transforming the way testing is conducted. Figure 3: Transition of automated testing from AI to Quantum AI Quantum computing's ability to handle complex optimization problems can significantly improve test case generation, test suite optimization, and resource allocation in automated testing. Quantum ML algorithms can enable more sophisticated and accurate models for anomaly detection, regression testing, and predictive analytics. Quantum computing's ability to perform parallel computations can greatly accelerate the execution of complex test scenarios and large-scale test suites. Quantum algorithms can help enhance security testing by efficiently simulating and analyzing cryptographic algorithms and protocols. Quantum simulation capabilities can be leveraged to model and simulate complex systems, enabling more realistic and comprehensive testing of software applications in various domains, such as finance, healthcare, and transportation. Parting Thoughts AI has significantly revolutionized the traditional landscape of testing, enhancing the effectiveness, efficiency, and reliability of software quality assurance processes. AI-driven techniques such as ML, anomaly detection, NLP, and intelligent test prioritization have enabled organizations to achieve higher test coverage, early defect detection, streamlined test script creation, and adaptive test maintenance. The integration of AI in automated testing not only accelerates the testing process but also improves overall software quality, leading to enhanced customer satisfaction and reduced time to market. As AI continues to evolve and mature, it holds immense potential for further advancements in automated testing, paving the way for a future where AI-driven approaches become the norm in ensuring the delivery of robust, high-quality software applications. Embracing the power of AI in automated testing is not only a strategic imperative but also a competitive advantage for organizations looking to thrive in today's rapidly evolving technological landscape. This is an article from DZone's 2023 Automated Testing Trend Report.For more: Read the Report
Prerequisites Genetic Algorithms is an advanced topic. Even though the content has been prepared to keep in mind the requirements of a beginner, the reader should be familiar with the fundamentals of Programming and Basic Algorithms before starting with this article. Additionally, consider sharpening your mathematical skills; they will greatly aid in comprehending the examples. Introduction to Optimization Optimization entails the art of enhancing performance. In any given process, we encounter a collection of inputs and outputs, as illustrated in the diagram. Optimization involves the search for input values that yield the most favorable output results. The concept of favorable can vary depending on the specific problem at hand, but in mathematical contexts, it typically involves the pursuit of maximizing or minimizing one or more objective functions through adjustments to the input parameters. Let's consider y =f(x); if f’(x)=0 at a point x=x*, then the optimum (max or min) or the inflection point exists at that point. Upon closer examination, it becomes apparent that the first non-zero higher-order derivative is typically denoted as 'n,' such that. If n is an odd number, x* is an inflection point. Otherwise, x* is a local optimum point. Expanding on this idea... If the value of the next high-order derivative is +ve, x* is a local minimum point. If the value of the next high-order derivative is -ve, x* is a local maximum point. For example: Principles of Optimization Consider a constrained optimization problem below: Based on the characteristics of the constraints, a feasible region is identified. Any point situated within this feasible region is considered a potential candidate for the optimal solution. The points located within the feasible region are referred to as free points, while those situated on the boundary of the region are categorized as bound points. Hence, an optimal solution can manifest as either a free point or a bound point within the feasible region. Gradient-based methods, particularly derivative-based approaches, have been a conventional means of addressing unconstrained optimization problems. However, it's important to note that they come with several limitations and drawbacks. What Is a Genetic Algorithm? Throughout history, nature has consistently provided an abundant wellspring of inspiration for humanity. Genetic Algorithms (GAs) are search-based algorithms rooted in the principles of natural selection and genetics. GAs constitute a subset of the broader field of computation known as Evolutionary Computation. Genetic Algorithms (GAs) were originally developed by John Holland, along with his students and colleagues at the University of Michigan, notably including David E. Goldberg. Since their inception, GAs have been applied to a wide array of optimization problems, consistently achieving a high level of success. In Genetic Algorithms (GAs), a population of potential solutions to a given problem is established. These solutions then undergo processes of recombination and mutation, mirroring the principles found in natural genetics, resulting in the creation of new offspring. This iterative process spans multiple generations. Each individual or candidate solution is evaluated based on its fitness value, typically determined by its objective function performance. Fitter individuals are accorded a higher probability of reproducing and generating even more capable offspring. This aligns with the Darwinian Theory of "Survival of the Fittest." In this manner, GAs continually evolve and refine the quality of individuals or solutions across generations until a predefined stopping criterion is met. Genetic Algorithms exhibit a significant degree of randomness in their operations, but they outperform simple random local search methods, as they also incorporate historical information to guide their search for optimal solutions. Why Genetic Algorithm? Genetic Algorithms (GAs) possess the remarkable capability to deliver a "sufficiently good" solution in a timely manner, especially when dealing with large-scale problems where traditional algorithms might struggle to provide a solution. GAs offer a versatile and generic framework for tackling intricate optimization problems. Here are some advantages of using Genetic Algorithms (GAs): Versatility: GAs can be applied to a wide range of optimization problems, making them a versatile choice for various domains, including engineering, finance, biology, and more. Global search: GAs excel at exploring the entire search space, enabling them to find solutions that might be missed by local search algorithms. This makes them suitable for problems with multiple local optima. No need for derivatives: Unlike many optimization techniques, GAs do not require derivatives of the objective function, making them applicable to problems with non-continuous, noisy, or complex fitness landscapes. Parallelism: GAs can be parallelized effectively, allowing for faster convergence on high-performance computing systems. Stochastic nature: The stochastic nature of GAs ensures that they can escape local optima and explore the search space more thoroughly. Adaptability: GAs can adapt and adjust their search strategies over time, which is particularly useful for dynamic or changing optimization problems. Solution diversity: GAs maintain a diverse population of solutions, which can help in finding a wide range of possible solutions and prevent premature convergence. Interpretability: In some cases, GAs can provide insights into the structure of the solution space, helping users better understand the problem. Combinatorial problems: GAs are well-suited for combinatorial optimization problems, such as the traveling salesman problem and job scheduling. Parallel evolution: GAs can be used to evolve multiple solutions simultaneously, which is valuable in multi-objective optimization and other complex scenarios. It's important to note that while GAs offer these advantages, they are not always the best choice for every optimization problem, and their performance can vary depending on the problem's characteristics. Proper problem analysis and algorithm selection are essential for achieving optimal results. Genetic Algorithm Terminology Populations and generations: A population is an array of individuals. For example, if the size of the population is 100 and the number of variables in the fitness function is 3, you represent the population by a 100-by-3 matrix. The same individual can appear more than once in the population. For example, the individual (2, -3, 1) can appear in more than one row of the array. At each iteration, the genetic algorithm performs a series of computations on the current population to produce a new population. Each successive population is called a new generation. Parents and children: To create the next generation, the genetic algorithm selects certain individuals in the current population, called parents, and uses them to create individuals in the next generation, called children. Typically, the algorithm is more likely to select parents that have better fitness values. Individuals: An individual is any point to which you can apply the fitness function. The value of the fitness function for an individual is its score. For example, if the fitness function is: f(x1,x2,x3)=(2x1+1)2+(3x2+4)2+(x3−2)2 The vector (2, -3, 1), whose length is the number of variables in the problem, is an individual. The score of the individual (2, –3, 1) is f(2, –3, 1) = 51. An individual is sometimes referred to as a genome or chromosome, and the vector entries of an individual as genes. Fitness functions: The fitness function is the function you want to optimize. For standard optimization algorithms, this is known as the objective function. Fitness values and best fitness values: The fitness value of an individual is the value of the fitness function for that individual. The best fitness value for a population is the smallest fitness or largest fitness value for any individual in the population, depending on the optimization problem. Convergence: The point at which the GA reaches a solution that meets the termination criteria. This can be an optimal or near-optimal solution. Search space: The set of all possible solutions to the optimization problem. Diversity: Diversity refers to the average distance between individuals in a population. A population has high diversity if the average distance is large; otherwise, it has low diversity. Diversity is essential to the genetic algorithm because it enables the algorithm to search a larger region of space. Genotype: The internal representation of a chromosome (e.g., a binary or numerical string). Phenotype: The actual solution represented by a chromosome. It is obtained by decoding the genotype. Crossover rate: The probability that two parents will undergo crossover to produce offspring in a new generation. Mutation rate: The probability that a gene (or a portion of the chromosome) will undergo mutation in a new generation. Fundamental Genetic Algorithm (GA): Pseudocode Detailed Strategies of a Fundamental Genetic Algorithm (GA) Encoding and Population Chromosome encodes a solution in the search space Usually as strings of 0's and 1's If l is the string length, the number of different chromosomes (or strings) is 2l Population A set of chromosomes in a generation Population size is usually constant Common practice is to choose the initial population randomly. Fitness Evaluation Fitness/objective function is associated with each chromosome. This indicates the degree of goodness of the encoded solution. If the minimization problem is to be solved, then fitness = 1 / objective or fitness = -1 * objective. If the maximization problem is to be solved, then fitness = objective Selection More copies to good strings. Fewer copies to bad string. Proportional selection scheme. Number of copies taken is directly proportional to its fitness. Mimics the natural selection procedure to some extent. Roulette wheel selection and Tournament selection are two frequently used selection procedures. Crossover Exchange of genetic information It takes place between randomly selected parent chromosomes Single-point crossover and Uniform crossover are the most commonly used schemes. Probabilistic operation Mutation Random alteration in the genetic structure Introduces genetic diversity into the population. Exploration of new search areas Mutating a binary gene involves simple negation of the bit Mutating a real coded gene defined in a variety of ways Probabilistic operation Elitism A strategy that involves preserving the best-performing individuals from one generation to the next. The fittest individuals are guaranteed to survive and become part of the next generation without undergoing any genetic operations like crossover or mutation. Elitism ensures that the best solutions found so far are not lost during the evolutionary process and can continue to contribute to the population. It ensures Steady Improvement and Accelerated Convergence. Termination Criteria The cycle of selection, crossover, and mutation is repeated a number of times until one of these occurs The average fitness value of a population is more or less constant over several generations. The desired objective function value is attained by at least one string in the population. Number of generations (or iterations) is greater than some threshold (most commonly used). Variations in Genetic Algorithms (GAs) Differential Evolution (DE): DE is a variant of GAs specifically designed for real-valued optimization problems. It uses vector-based mutation and recombination operators. Estimation of Distribution Algorithms (EDAs): EDAs model and learn the probability distribution of promising solutions in the population and use this distribution to generate new candidate solutions. Self-Adaptive Genetic Algorithms: Allow the algorithm to adapt its genetic operators (mutation rates, crossover types) based on the evolving population's characteristics, leading to efficient convergence. Niching Algorithms: These algorithms focus on finding multiple diverse solutions in a single run, often in multimodal optimization problems where there are multiple peaks or modes in the fitness landscape. Multi-objective Evolutionary Algorithms (MOEAs): MOEAs address problems with multiple conflicting objectives. They aim to find a set of Pareto-optimal solutions representing trade-offs between these objectives. Hybrid Algorithms: Integrate GAs with other optimization techniques, machine learning models, or domain-specific heuristics to enhance performance and robustness. I aimed to provide a concise overview of Genetic Algorithms and Optimisation. However, if you have any particular questions or need more detailed information on this extensive subject, please feel free to ask in the comments. I appreciate your time and attention! You may reach me on LinkedIn.
In my previous post, I highlighted the difference between efficiency and effectiveness and how it maps to artificial versus human intelligence. Doing things fast and with minimum waste is the domain of deterministic algorithms. But to know when we’re building the right thing (effectiveness) is our domain. It’s a slippery and subjective challenge, tied up with the confusing reality of trying to make human existence more comfortable with the help of software. Today I want to talk about essential complexity. A fully autonomous AI programmer would need to be told exactly what we want, and why, or it should be sufficiently attuned to our values to fill in the gaps. Sadly, we cannot trust AI yet to reliably connect the dots without human help and corrections. It’s not like telling an autonomous vehicle where you want to go. That has a very simple goal – and we’re nowhere near a failsafe implementation. Essential complexity is about “debugging the specification,” figuring out what we, the people, need and why we need it. Accidental complexity is a consequence of the alternatives we choose to implement these ideas. Frederick Brooks’ enduring distinction between essential and accidental complexity is analogous to the realms of human vs. machine intelligence similar to the effectiveness/efficiency distinction of the previous post. Since fully autonomous software production by businesspersons could only work if they state exactly and unambiguously what they want, developers smugly conclude that their jobs are safe. I’m not so sure that such perfect specs are a necessary condition. I mean, they aren’t now. Who postpones coding until they have complete, unambiguous, and final specifications? Programming means fleshing out the specification in your IDE from a sufficiently clear roadmap, filling in the details as you go along. It’s not mere implementation; it’s laying the bricks for the first floor while you’re still tweaking the drawing for the roof. It seems inefficient, but it turns out we can’t imagine our dream house perfectly unless we’re halfway done building it, at least when it comes to making software. AI is already very qualified to deal with much of the accidental complexity you encounter on the way. We should use it as much as we can. I know I devoted three articles to the Java OCP 17 exam (link here for Part 1, Part 2, and Part 3), but I believe (and hope) that rote knowledge of arcane details will go the way of the dodo. AI takes care of idiomatic usage, it can enforce clean code, good naming conventions, and even write source documentation. And it will get better and better at it. It can even do full-blown migrations of legacy code to new language and framework versions. I’m all for it. Migrating a Java 4 EJB2 behemoth to Spring Boot 3 microservices by hand is not my idea of fun. If in five years’ time, the state of the art in code assistance still leaves you unimpressed while writing code, it’s probably not because of some accidental complexity the machine can’t handle. It’s most likely the essential complexity it can’t deal with. If your mortgage calculator outputs a 45.4% mortgage interest rate and the co-pilot won’t warn you that you probably confused a decimal point, it’s because it has never bought a house itself and won’t notice that the figure is an order of magnitude too steep. Essential complexity can be expressed in any medium; it needn’t be computer code. Once you know exactly how something should work, most coding challenges become easy by comparison, provided you are competent in your language of choice. So, we break down complicated domains into manageable chunks and we grow the product, improving and expanding it with each iteration. That doesn’t always work. Sometimes the essential complexity cannot be reduced, and you need a stroke of genius to make progress. Take for example asymmetric key exchange, a tantalizing problem that tormented the greatest mathematical minds for decades, if not centuries. Alice and Bob can communicate using an uncrackable encryption key, but if they don’t know that Eve has intercepted it, everything is in the open. If only we could have a pair of keys, such that you can encrypt a message with key A but can only decrypt it with key B and no practical way to deduce one key from the other. If you then give out one part of the key to everybody and protect the other part of your life, you have solved the key exchange. It's simple enough to state where you want to arrive but it’s hardly a specification from which to start coding. It’s not even a programming task. It’s the quest for inventing an algorithm that may not even be possible. In Scrum Poker you would draw the infinity card. The algorithms that Whitfield Diffie and Martin Hellman ultimately devised fit on the proverbial napkin. Translating it to code would be trivial by comparison. But they could never have arrived at the solution incrementally behind a keyboard. Or read about the fascinating story of cracking the Enigma cipher by the team at Bletchley Park. An even more daunting task because there was literally a war to win. You cannot make a masterpiece to order, nor in the art of science. If we knew what made a good song, we could replicate the process, if not using software, then at least by some formulaic method. But that doesn’t produce classics. Creativity is a hit-and-miss process and few artists consistently produce works of genius. There’s no reason why we should expect great AI advances in that department. But we can expect better tooling to get the creative juices flowing. Songwriters use a rhyming dictionary and thesaurus in search of inspiration. That’s not cheating. Fortunately, unless you’re working at a university or research institute, enterprise software development and maintenance isn’t about solving centuries-old math conundrums. However, you should ponder more deeply what we want and need, instead of learning a cool new framework or getting another AWS certificate. Uncovering the essential complexity is not just the job of business analysts in the organization. I can’t wait for next-generation tooling to help us grapple with it, because that would be a genuine copilot instead of an autopilot.
Generative AI refers to a category of artificial intelligence techniques that involve generating new data, such as images, text, audio, and more, based on patterns learned from existing data. Generative models like Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) have demonstrated remarkable capabilities in generating realistic and diverse data for various purposes, including data collection. Leverage Generative AI for Data Collection Data Augmentation Generative models can create new samples that closely resemble your existing data. By incorporating these generated samples into your training data, you can improve your model's performance and resilience, particularly in tasks such as image classification and object detection. Imputation of Missing Data If your datasets have missing values, generative models can fill those gaps with plausible values. This can enhance the quality and comprehensiveness of your data. Synthetic Data Generation Obtaining a diverse dataset might be challenging due to privacy concerns or data scarcity. Generative models can be trained on a small dataset to generate synthetic data that mirrors the actual data distribution. Merging this synthetic data with your real data can effectively expand your dataset. Data Generation for Testing and Validation In situations where you require representative data to test and validate your models or algorithms, generative models can produce synthetic data covering a wide range of scenarios. This aids in ensuring the robustness of your solution. Creative Content Generation Generative models can craft artistic and creative content, including artwork, music, and literature. This is valuable for applications like content creation, where a variety of creative outputs is desired. Data Preprocessing and Transformation Generative models can convert data from one domain to another. For instance, in style transfer, a generative model can change images from one artistic style to another. Anomaly Detection Generative models can learn the typical data distribution and then identify anomalies or outliers that deviate from that distribution. This can be beneficial for detecting fraudulent transactions or abnormal behavior. It's crucial to recognize that while generative AI offers numerous benefits for data collection and enhancement, careful evaluation is necessary when applying these techniques. The quality and suitability of the generated data should be thoroughly assessed before integration into your workflows or models. Additionally, ethical considerations, privacy concerns, and legal implications should be considered, especially when generating synthetic data. Mastering Seamless and Intelligent Data Transfer With Generative AI Achieving seamless and intelligent data transfer using Generative AI entails several steps and considerations. Here's a general outline of the process: Data Preprocessing Prepare your source data by cleaning, preprocessing, and structuring it to ensure it's suitable for input to the Generative AI model. Generative AI Model Selection Depending on your specific data and use case, choose an appropriate generative model like Generative Adversarial Networks (GANs) or Variational auto-encoders (VAEs). Model Training Train the selected generative model using your preprocessed data. During training, the model learns the underlying data patterns and distributions. Data Generation Once the generative model is trained, use it to generate new data samples resembling the original data distribution. This generated data can be in various formats, such as images, text, or other types of data. Data Transformation (Optional) If you need to transfer data across different domains or styles, apply transformations using the generative model. For instance, style transfer techniques can transform images from one artistic style to another. Data Integration Combine the generated data with your existing dataset or target application. This might involve merging synthetic data with real data to create a more extensive and diverse dataset. Testing and Validation Thoroughly assess the quality and relevance of the generated data. Ensure it aligns with your requirements and objectives. Data Transfer and Deployment Integrate the generated data into your desired workflows, applications, or systems where intelligent data transfer is essential. Monitoring and Iteration Continuously monitor the generative AI model's performance and the impact of transferred data. Iterate and refine the process as needed to achieve optimal results. Fostering Personalized Creativity Through Generative AI Harnessing the power of generative models to create unique and tailored experiences involves consuming data for creativity and personalization through Generative AI. Here's a guide to achieving this: Understanding generative AI: Familiarize yourself with generative AI techniques, including Generative Adversarial Networks (GANs), Variational auto-encoders (VAEs), and other generative models. Understand their capabilities and potential applications. Data collection and preparation: Gather diverse and representative datasets aligned with your creative goals. These datasets may encompass images, text, audio, or other relevant data types. Model selection: Choose a suitable generative model based on your objectives. For example, GANs can be effective for image generation, while language models like GPT-3 could be beneficial for text-related tasks. Training the generative model: Train the chosen generative model using your prepared data. This involves adjusting model parameters, architecture, and hyperparameters to achieve the desired output quality. Creative content generation: Utilize the trained generative model to produce creative content, such as visual artworks, music compositions, written pieces, and more. Personalization: Incorporate user preferences and inputs to customize the generated content. This can involve themes, genres, moods, or prompts provided by users. Feedback loop: Establish a feedback mechanism to refine the generative model based on user preferences and evaluations. This iterative process enhances content quality and personalization. Ethical considerations: Ensure that the generated content adheres to ethical guidelines, avoiding biases, offensive material, or sensitive information. User experience (UX) design: Design intuitive interfaces or platforms for users to interact with and customize the generative AI. User experience plays a pivotal role in enhancing engagement. Testing and validation: Rigorously test and validate the generated content to ensure its quality, relevance, and appeal to users. Cultivating Creative Brilliance Fostering a creative, data-driven culture using Generative AI requires a strategic approach to integrating data-driven decision-making and generative AI technologies into your organization's creative processes. Here's a step-by-step guide to achieving this: Educate and build awareness: Introduce Generative AI and its potential benefits to your creative teams. Provide training sessions and workshops to help them understand its applications. Align with goals and vision: Clearly articulate how integrating Generative AI aligns with your organization's creative goals and long-term vision. Leadership support: Obtain leadership support by demonstrating how Generative AI can drive creative excellence and contribute to success. Data strategy: Develop a comprehensive data strategy outlining the data needed for creative projects, data collection, preprocessing, and the application of Generative AI. Cross-functional collaboration: Facilitate collaboration between creative teams, data scientists, and technology experts. Identify use cases: Identify use cases where Generative AI can enhance creativity, like content generation and design exploration. Data integration: Incorporate Generative AI into creative workflows, integrating generated content with the work of designers, artists, and writers. Prototype and experiment: Encourage teams to experiment with Generative AI in small-scale projects to showcase its potential impact. Feedback and iteration: Establish feedback loops to gather insights from creative teams using Generative AI. Ethical considerations: Address ethical considerations, such as biases, transparency, and privacy. Skill development: Provide training for creative professionals to enhance their understanding of Generative AI. Showcase success stories: Highlight successful projects where Generative AI enhanced creativity and drove innovation. Iterative implementation: Gradually expand Generative AI integration into various creative projects, learning and refining the approach. Measure impact: Develop metrics to measure Generative AI's impact on creativity, innovation, and user engagement. Continuous learning: Stay updated on Generative AI advancements and adapt strategies as technology evolves. Guiding Ethical and Responsible Data Practices in Generative AI Ethical and responsible generative AI data practices involve principles and guidelines that ensure the ethical use of data when employing generative AI techniques. These practices uphold rights, privacy, and well-being, prevent biases, and promote transparency and accountability. Here are key ethical and responsible generative AI data practices: Informed data collection and use: Collect data with consent and transparency, using it only for intended and lawful purposes. Privacy protection: Anonymize or de-identify personal data and implement robust data security measures. Bias detection and mitigation: Identify and minimize biases in training data to prevent unfair or discriminatory outcomes. Transparency and explainability: Make the generative AI process understandable and communicate limitations and potential risks. User empowerment and control: Give users control over generated content and allow feedback. Data minimization: Collect only the necessary data for generative AI applications. Accountability and governance: Establish accountability and implement policies for the ethical use of generative AI. Validation and testing: Rigorously test and validate generated content before deployment. Continuous monitoring and auditing: Monitor generative AI behavior and conduct regular audits. Community engagement: Engage with stakeholders and experts to gather feedback on ethical implications. Legal compliance: Ensure compliance with data protection laws and industry standards. Education and training: Educate employees and stakeholders about ethical considerations. By adhering to these practices, organizations can leverage generative AI while upholding ethical standards. Forging Frontiers of Innovation Pioneering innovation with Generative AI involves using generative models to create imaginative solutions. Here's a roadmap for this journey: Education and exploration: Understand Generative AI concepts and explore existing applications and case studies. Identify opportunities: Find areas in your domain that can benefit from Generative AI. Cross-disciplinary collaboration: Foster collaboration between experts in AI, data science, design, and other fields. Concept generation: Brainstorm and sketch out innovative projects using Generative AI. Prototype development: Build prototypes to test concepts. Data collection and preprocessing: Gather and preprocess relevant datasets. Model development and training: Develop and train Generative AI models. Iterative refinement: Continuously refine models based on feedback. Validation and testing: Test and validate generated content. Implementation and deployment: Integrate Generative AI Solutions Showcase and demonstration: Highlight outcomes of Generative AI projects. Continuous learning and adaptation: Stay updated and adapt strategies based on emerging trends. Ethical considerations: Address biases, transparency, and privacy. Collaborate with the community: Engage with Generative AI Communities By following this roadmap, you can drive innovation with Generative AI and create transformative solutions. Conclusion Generative AI involves creating diverse data using patterns from existing information, with models like GANs and VAEs proving adept in tasks such as data augmentation, missing data imputation, and creative content generation. Adherence to ethical guidelines and careful evaluation are imperative. Effective utilization necessitates preprocessing, model selection, training, integration, and validation for seamless data transfer. Tailoring personalized experiences requires model training, creative content generation, personalization, and ethical considerations. Cultivating a creative culture involves education, alignment, collaboration, and ethical awareness. Ethical data practices involve transparency, bias detection, privacy protection, and accountability. Innovating with Generative AI entails exploration, cross-disciplinary collaboration, prototype development, validation, and continuous learning. Balancing innovation and ethics is crucial for leveraging Generative AI's potential.
In this sixth article of the series, we describe and provide the full source for a mixed reality app that takes any visual and audio input from a user's surroundings, processes it with various AI, and returns output of summaries, results of generative AI, etc. This is the sixth piece in a series on developing XR applications and experiences using Oracle and focuses on visual and audio input from a user's surroundings, processing it with various AI, and returning output of summaries, results of generative AI, etc. Find the links to the first five articles below: Develop XR With Oracle, Ep 1: Spatial, AI/ML, Kubernetes, and OpenTelemetry Develop XR With Oracle, Ep 2: Property Graphs and Data Visualization Develop XR With Oracle, Ep 3: Computer Vision AI, and ML Develop XR With Oracle, Ep 4: Digital Twins and Observability Develop XR With Oracle, Ep 5: Healthcare, Vision AI, Training/Collaboration, and Messaging As with the previous posts, I will specifically show applications developed with Oracle database and cloud technologies using Magic Leap 2, HoloLens 2, Oculus, iPhone and Android, and PC and written using the Unity platform and OpenXR (for multi-platform support), Apple Swift, and WebXR. Throughout the blog, I will reference the corresponding demo video below: Extended Reality (XR) and AI in Everyday Life With Contextual and Situational Awareness I will refer the reader to the first article in this series (again, the link is above) for an overview of XR. This piece is focused on Mixed Reality/Augmented Reality use cases and AI, though there are certainly an amazing number of examples where AI is used in VR and we will explore those in future blogs in the series. The combination used to develop these applications is a powerful and perfect one. Unity, Cohere, and other Oracle AI services, and at the heart of the solution the Oracle database. A Perfect XR AI Fullstack Solution: Unity, Kubernetes, Oracle AI and Cohere, and Oracle Database The application architecture consists of a full stack with Unity creating the front-end, Java Spring Boot, and other microservices running in Kubernetes (which can consist of NVidia compute) forming the backend and calling out to the various AI services and the Oracle database storing the various information including vector, spatial, etc. data. Examples Here are some of the many examples. Each is triggered by a voice command via the XR headset's voice recognition system. "Explain This..." In reaction to this command, the Unity client-side app takes a picture from the headset wearer's point of view and sends the picture to a serverside Spring Boot app running in Kubernetes (though Kubernetes is not a requirement nor is Spring Boot, and this can be achieved with Python, Javascript, .NET, and other languages including languages running directly within the database). The server-side app calls the Oracle Vision AI service, which detects and returns the text in the picture. The text can be at any angle, etc., and the service will interpret it correctly. The server-side app then takes this text and passes it to Cohere (or Hugging Face or OpenAI) to get an explanation of the health report in both text and speech, in simple terms, and suggestions on ways to improve results/health. The image, content, and summary are stored in the database where further analysis can be conducted, from basic advanced text search to image analysis, Oracle Machine Learning AI, etc. This use case shows how complex medical information and health test results can be easily and quickly interpreted and corrective actions taken by a common person without the need for or supplementary to a professional health care worker. "Summarize This..." This example is very similar to the "Explain this" example and simply illustrates the versatility of such command functionality. In this case, the content is summarized, again in both text and speech, in x (e.g., 3) sentences or less. This use case shows how mass amounts of information can be quickly consumed. “Select All From Summaries...” In reaction to this command, the Unity client-side app calls the server side to query the Oracle database and return various information such as the images and texts of the previous commands that were recorded. This use case obviously goes beyond this scenario. It is effectively the ability to query the database for anything and do so from anywhere and use all of the complex features the Oracle database has to offer. Queries can be based on text, speech, images, AI of voice-to-text-to-SQL, situational and environmental information and sensors, etc., and include advanced analytic queries, etc. "Transcribe [YouTube Video]..." As opposed to "Transcribe Audio," which takes or records an audio clip for transcription, in reaction to this command, the app will check to see if a YouTube URL is in the user's view, and if so, will use the API to get the transcription of the video. It can additionally be "transcribed and summarized" as described previously. This use case again shows how mass amounts of information can be quickly consumed and can be taken from a number of different input sources including video and audio which may be advantageous for a number of reasons or for those with vision or hearing impairment. “Generate Image [Of ...] Stop Generate Image” In reaction to this command, the Unity app starts recording the user's speech and stops when the user says "Stop generating image." The speech is sent to the server-side application which then transcribes it, removes "stop generate image" from the end of the transcription, and sends the text description to Stable Diffusion (or DALL-E, etc.) which creates an image and passes it (or it's URL location) back to the application and headset which displays the image in front of the user. This is a very simple example of course, and this functionality is no longer limited to 2D but also 3D images, sounds, etc. It demonstrates the ability to dynamically generate and add visual context for quick storyboarding, collaboration, and training, AEC (architecture, engineering, and construction), etc., and, of course, gaming. This can also be used with USD (Universal Scene Description) and tech such as NVidia's Omniverse platform for extremely compelling collaboration experiences. “Tell a Dystopian Story About This and Give Sentiments...” In reaction to this command, again the Unity client-side app takes a picture from the headset wearer's point of view and sends it to the server-side component that this time, uses Oracle Vision AI to identify all of the objects in the user's view. It then sends these objects to Cohere (or other AI) and asks that a story of a certain style be generated from the contents. This story is then further analyzed using Oracle's AI Sentiment Analysis service and the story and the sentiment analysis are sent back to the user to view in both text and speech. This use case again shows not only a potential for creativity and creating of a story around a mood or interesting setting of objects but can be used for situational analysis in any number of military and police, health and safety, social, etc., scenarios. In the third blog of this series (linked in the opening of this article), two other examples were given. “Identify Objects...” In reaction to this command, again the Unity client-side app takes a picture from the headset wearer's point of view and sends it to the server-side component that this time uses Oracle Vision AI/Object Detection to detect the objects and their relative coordinates. These are sent back to the Unity client-side app which projects raycasts through these coordinates to draw bounding boxes with identification labels on the spatial location of the objects in the room and to also speech the object names from the spatial coordinates where they are located. As far as use cases in this and other commands/functionality mentioned, it is not difficult to imagine the potential to assist with vision impairment, Alzheimer’s, identification of unknown and difficult-to-isolate items, analysis of threats, interests, etc., by having the XR device give contextual audio and visual feedback about one's surroundings. "Start Conversation..." In reaction to this command, again the Unity client-side app takes a picture from the headset wearer's point of view and sends it to the server-side component that this time uses Oracle Vision AI/Document AI to retrieve the text found (various book titles), pick a title at random, and send the text to a number of (at that point GPT-3 was the latest) conversation models which then feedback various conversational responses. These responses, or again any information from various models, are given to the user. There are again so many situational awareness use cases that such functionality provides but the ability to strike up a good conversation is powerful enough. The Source Code, OpenXR, and Other Dev Details I will describe key snippets of the programs and components involved in this application in this blog. The full source code for both the Unity frontend and Java backend can be found on GitHub. This repo will continue to grow with examples of other functionality, different languages, frameworks, etc. It is also where code for an upcoming "Develop XR with Oracle" workshop will be located. A basic understanding of Unity and Java (for these particular examples) will be required along with particulars of the XR headset you are using. This application has been tested on Hololens 2 (shown in the pictures and video of this blog) and Magic Leap 2. The Unity application is coded using MRTK 3 (mixed reality toolkit) which conforms to the OpenXR standard. Magic Leap recently joined the MRTK steering committee which was formed by Microsoft and includes Qualcomm as a member. This wide adoption of OpenXR makes applications written to it more portable; however, Apple is not a member, and similar to other technologies where software and varying hardware must work together, even with this standard, code must have adapters for certain native functionality, particularly in the XR space where approaches and features may differ greatly between products. For example, the application(s) described are actually fairly simple (where 1/input generally has a device-specific setup, whereas 2, 3, and to a large extent 4, are generally portable as-is): Input (picture, video, audio, etc.) is taken from the XR device. Input is sent to the server (for AI, database, etc., processing). The response is received from the server side. The content of the response is given as output (text, picture, video, sound, etc.). Let's look at the "no code" and code involved in the order list above. We add an MRTK object to our Unity scene. This object contains profiles with configuration for Experience Settings, Camera, Input, Boundary, Teleport, Spatial Awareness, etc. Under Input, there are various settings for Controllers, Gestures, etc. including Speech. Here, we simply specify Keywords such as "Explain this," "Generate image," etc. We then add a Microsoft.MixedReality.Toolkit.Input.SpeechInputHandler component to the scene which maps the keywords we defined to actions. For example, here we see the ExplainThis method of an object with the TakePicAndProcessAI script/class will be called when the user says "Explain this." Looking closer at the TakePicAndProcessAI C# source, we see the following for ExplainThis (similar to other commands involving the general process of sending a picture for AI processing: C# public void ExplainThis() { endPoint = "visionai/explain"; TakePicAndProcess(); } TakePicAndProcess() contains logic and settings to take a picture and (temporarily) save it locally as well as an OnCapturedPhotoToDisk callback. OnCapturedPhotoToDisk in turn loads the pic/image into a texture for display in front of the user and calls the ProcessAI subroutine. This ProcessAI subroutine has code that is dynamic based on parameters provided to, it but has the overall purpose of making network calls to the serverside and marshaling data to and from it. It also displays the returned AI content in a text field (or 2D or 3D object, as the case may be) for the user to see, executes text-to-speech for the user to hear, etc. C# byte[] imageData = image.EncodeToPNG(); WWWForm form = new WWWForm(); form.AddBinaryData("file", imageData, "image.png", "image/png"); if (genopts != null) { form.AddField("genopts", genopts); genopts = null; } Debug.Log("Making AI etc. calls, providing file: " + filePath + "..."); UnityWebRequest request = UnityWebRequest.Post(backendURL + "/" + endPoint, form); yield return request.SendWebRequest(); if (request.result == UnityWebRequest.Result.Success) { string jsonResponse = request.downloadHandler.text; if (textMesh != null) textMesh.text = jsonResponse; textToSpeechLogic.talk(textToSpeechObject.GetComponent<TextToSpeech>(), jsonResponse); Debug.Log(jsonResponse); } On the server side, we have simple-to-use logic such as the following to process the image sent from the Unity app Vision AI and extract the text found: Java features.add(detectImageFeature); features.add(textDetectImageFeature); InlineImageDetails inlineImageDetails = InlineImageDetails.builder().data(bytesOfImagePassedInFromUnityApp).build(); AnalyzeImageRequest request = AnalyzeImageRequest.builder().analyzeImageDetails(analyzeImageDetails).build(); AnalyzeImageResponse response = aiServiceVisionClient.analyzeImage(request); And the following to process the text retrieved from Vision AI to get a summary using Cohere (or Hugging Face or OpenAI): Java AsyncHttpClient client = new DefaultAsyncHttpClient(); client.prepare("POST", "https://api.cohere.ai/v1/summarize") .setHeader("accept", "application/json") .setHeader("content-type", "application/json") .setHeader("authorization", "[your bearer token here]") .setBody("{\"length\":\"medium\",\"format\":\"paragraph\",\"model\":\"summarize-xlarge\",\"extractiveness\":\"low\",\"temperature\":0.3," + "\"text\":\"" + textRetrievedFromVisionAI + "\"}").execute(); As well as the following to process Oracle database storage, retrieval, and other logic. Java //Using JPA summaryRepository.saveAndFlush(new Summary(explanationOfResults, bytes)); Again, this server-side logic can be in any language as all OCI services, Cohere services, and the Oracle Database support them all. Additional Thoughts I have given some ideas and examples of how AI and XR can be used together and facilitated by Oracle. I look forward to putting out more blogs on this topic and other areas of XR with Oracle Cloud and Database soon including the use of new advanced features in the Spatial component of the Oracle Database for geometric storage (.OBJ mesh, point cloud, etc.), analysis, and processing that works with Unity at runtime in real-time and the ability to run server-side languages colocated within the Oracle database itself. A "Develop XR with Oracle" workshop will also be released in the near future. Please see my other publications for more information on XR and Oracle cloud and database, as well as various topics around modern app dev including microservices, observability, transaction processing, etc. Finally, please feel free to contact me with any questions or suggestions for new blogs and videos, as I am very open to suggestions. Thanks for reading and watching.
Code scanning for vulnerability detection for exposure of security-sensitive parameters is a crucial practice in MuleSoft API development. Code scanning involves the systematic analysis of MuleSoft source code to identify vulnerabilities. These vulnerabilities could range from hardcoded secure parameters like password or accessKey to the exposure of password or accessKey in plain text format in property files. These vulnerabilities might be exploited by malicious actors to compromise the confidentiality, integrity, or availability of the applications. Lack of Vulnerability Auto-Detection MuleSoft Anypoint Studio or Anypoint platform does not provide a feature to keep governance on above mentioned vulnerabilities. It can be managed by design time governance, where a manual review of the code will be needed. However, there are many tools available that can be used to scan the deployed code or code repository to find out such vulnerabilities. Even you can write some custom code/script in any language to perform the same task. Writing custom code adds another complexity and manageability layer. Using Generative AI To Review the Code for Detecting Vulnerabilities In this article, I am going to present how Generative AI can be leveraged to detect such vulnerabilities. I have used the Open AI foundation model “gpt-3.5-turbo” to demonstrate the code scan feature to find the aforementioned vulnerabilities. However, we can use any foundation model to implement this use case. This can be implemented using Python code or any other code in another language. This Python code can be used in the following ways: Python code can be executed manually to scan the code repository. It can be integrated into the CICD build pipeline, which can scan and report the vulnerabilities and result in build failure if vulnerabilities are present. It can be integrated into any other program, such as the Lambda function, which can run periodically and execute the Python code to scan the code repository and report vulnerabilities. High-Level Architecture Architecture There are many ways to execute the Python code. A more appropriate and practical way is to integrate the Python code into the CICD build pipeline. CICD build pipeline executes the Python code. Python code reads the MuleSoft code XML files and property files. Python code sends the MuleSoft code content and prompts the OpenAI gpt-3.5-turbo model. OpenAI mode returns the hardcoded and unencrypted value. Python code generates the report of vulnerabilities found. Implementation Details MuleSoft API project structure contains two major sections where security-sensitive parameters can be exposed as plain text. src/main/mule folder contains all the XML files, which contain process flow, connection details, and exception handling. MuleSoft API project may have custom Java code also. However, in this article, I have not considered the custom Java code used in the MuleSoft API. src/main/resources folder contains environment property files. These files can be .properties or .yaml files for development, quality, and production. These files contain property key values, for example, user, password, host, port, accessKey, and secretAccesskey in an encrypted format. Based on the MuleSoft project structure, implementation can be achieved in two steps: MuleSoft XML File Scan Actual code is defined as process flow in MuleSoft Anypoint Studio. We can write Python code to use the Open AI foundation model and write a prompt that can scan the MuleSoft XML files containing the code implementation to find hardcoded parameter values. For example: Global.xml/Config.xml file: This file contains all the connector configurations. This is standard recommended by MuleSoft. However, it may vary depending on the standards and guidelines defined in your organization. A generative AI foundation model can use this content to find out hardcoded values. Other XML files: These files may contain some custom code or process flow calling other for API calls, DB calls, or any other system call. This may have connection credentials hard-coded by mistake. A generative AI foundation model can use this content to find out hardcoded values. I have provided the screenshot of a sample MuleSoft API code. This code has three XML files; one is api.xml, which contains the Rest API flow. Process.xml has a JMS-based asynchronous flow. Global.xml has all the connection configurations. api.xml process.xml global.xml For demonstration purposes, I have used a global.xml file. The code snippet has many hardcoded values for demonstration. Hardcoded values are highlighted in red boxes: Python Code The Python code below uses the Open AI foundation model to scan the above XML files to find out the hard-coded values. Python import openai,os,glob from dotenv import load_dotenv load_dotenv() APIKEY=os.getenv('API_KEY') openai.api_key= APIKEY file_path = "C:/Work/MuleWorkspace/test-api/src/main/mule/global.xml" try: with open(file_path, 'r') as file: file_content = file.read() print(file_content) except FileNotFoundError: except Exception as e: print("An error occurred:", e) message = [ {"role": "system", "content": "You will be provided with xml as input, and your task is to list the non-hard-coded value and hard-coded value separately. Example: For instance, if you were to find the hardcoded values, the hard-coded value look like this: name=""value"". if you were to find the non-hardcoded values, the non-hardcoded value look like this: host=""${host}"" "}, {"role": "user", "content": f"input: {file_content}"} ] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=message, temperature=0, max_tokens=256 ) result=response["choices"][0]["message"]["content"] print(result) Once this code is executed, we get the following outcome: The result from the Generative AI Model Similarly, we can provide api.xml and process.xml to scan the hard-coded values. You can even modify the Python code to read all the XML files iteratively and get the result in sequence for all the files. Scanning the Property Files We can use the Python code to send another prompt to the AI model, which can find the plain text passwords kept in property files. In the following screenshot dev-secure.yaml file has client_secret as the encrypted value, and db.password and jms.password is kept as plain text. config file Python Code The Python code below uses the Open AI foundation model to scan config files to find out the hard-coded values. Python import openai,os,glob from dotenv import load_dotenv load_dotenv() APIKEY=os.getenv('API_KEY') openai.api_key= APIKEY file_path = "C:/Work/MuleWorkspace/test-api/src/main/resources/config/secure/dev-secure.yaml" try: with open(file_path, 'r') as file: file_content = file.read() except FileNotFoundError: print("File not found.") except Exception as e: print("An error occurred:", e) message = [ {"role": "system", "content": "You will be provided with xml as input, and your task is to list the encrypted value and unencrypted value separately. Example: For instance, if you were to find the encrypted values, the encrypted value look like this: ""![asdasdfadsf]"". if you were to find the unencrypted values, the unencrypted value look like this: ""sdhfsd"" "}, {"role": "user", "content": f"input: {file_content}"} ] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=message, temperature=0, max_tokens=256 ) result=response["choices"][0]["message"]["content"] print(result) Once this code is executed, we get the following outcome: result from Generative AI Impact of Generative AI on the Development Life Cycle We see a significant impact on the development lifecycle. We can think of leveraging Generative AI for different use cases related to the development life cycle. Efficient and Comprehensive Analysis Generative AI models like GPT-3.5 have the ability to comprehend and generate human-like text. When applied to code review, they can analyze code snippets, provide suggestions for improvements, and even identify patterns that might lead to bugs or vulnerabilities. This technology enables a comprehensive examination of code in a relatively short span of time. Automated Issue Identification Generative AI can assist in detecting potential issues such as syntax errors, logical flaws, and security vulnerabilities. By automating these aspects of code review, developers can allocate more time to higher-level design decisions and creative problem-solving. Adherence To Best Practices Through analysis of coding patterns and context, Generative AI can offer insights on adhering to coding standards and best practices. Learning and Improvement Generative AI models can "learn" from vast amounts of code examples and industry practices. This knowledge allows them to provide developers with contextually relevant recommendations. As a result, both the developers and the AI system benefit from a continuous learning cycle, refining their understanding of coding conventions and emerging trends. Conclusion In conclusion, conducting a code review to find security-sensitive parameters exposed as plain text using OpenAI's technology has proven to be a valuable and efficient process. Leveraging OpenAI for code review not only accelerated the review process but also contributed to producing more robust and maintainable code. However, it's important to note that while AI can greatly assist in the review process, human oversight and expertise remain crucial for making informed decisions and fully understanding the context of the code.
If you’re anything like me, you’ve noticed the massive boom in AI technology. It promises to disrupt not just software engineering but every industry. THEY’RE COMING FOR US!!! Just kidding ;P I’ve been bettering my understanding of what these tools are and how they work, and decided to create a tutorial series for web developers to learn how to incorporate AI technology into web apps. In this series, we’ll learn how to integrate OpenAI‘s AI services into an application built with Qwik, a JavaScript framework focused on the concept of resumability (this will be relevant to understand later). Here’s what the series outline looks like: Intro and Setup Your First AI Prompt Streaming Responses How Does AI Work Prompt Engineering AI-Generated Images Security and Reliability Deploying We’ll get into the specifics of OpenAI and Qwik where it makes sense, but I will mostly focus on general-purpose knowledge, tooling, and implementations that should apply to whatever framework or toolchain you are using. We’ll be working as closely to fundamentals as we can, and I’ll point out which parts are unique to this app. Here’s a little sneak preview. I thought it would be cool to build an app that takes two opponents and uses AI to determine who would win in a hypothetical fight. It provides some explanation and the option to create an AI-generated image. Sometimes the results come out a little wonky, but that’s what makes it fun. I hope you’re excited to get started because in this first post, we are mostly going to work on… Boilerplate :/ Prerequisites Before we start building anything, we have to cover a couple of prerequisites. Qwik is a JavaScript framework, so we will have to have Node.js (and NPM) installed. You can download the most recent version, but anything above version v16.8 should work. I’ll be using version 20. Next, we’ll also need an OpenAI account to have access to their API. At the end of the series, we will deploy our applications to a VPS (Virtual Private Server). The steps we follow should be the same regardless of what provider you choose. I’ll be using Akamai’s cloud computing services (formerly Linode). Setting Up the Qwik App Assuming we have the prerequisites out of the way, we can open a command line terminal and run the command: npm create qwik@latest. This will run the Qwik CLI that will help us bootstrap our application. It will ask you a series of configuration questions, and then generate the project for you. Here’s what my answers looked like: If everything works, open up the project and start exploring. Inside the project folder, you’ll notice some important files and folders: /src: Contains all application business logic /src/components: Contains reusable components to build our app with /src/routes: Responsible for Qwik’s file-based routing; Each folder represents a route (can be a page or API endpoint). To make a page, drop a index.{jsx|tsx} file in the route’s folder. /src/root.tsx: This file exports the root component responsible for generating the HTML document root. Start Development Qwik uses Vite as a bundler, which is convenient because Vite has a built-in development server. It supports running our application locally, and updating the browser when files change. To start the development server, we can open our project in a terminal and execute the command npm run dev. With the dev server running, you can open the browser and head to http://localhost:5173 and you should see a very basic app. Any time we make changes to our app, we should see those changes reflected almost immediately in the browser. Add Styling This project won’t focus too much on styling, so this section is totally optional if you want to do your own thing. To keep things simple, I’ll use Tailwind. The Qwik CLI makes it easy to add the necessary changes, by executing the terminal command, npm run qwik add. This will prompt you with several available Qwik plugins to choose from. You can use your arrow keys to move down to the Tailwind plugin and press Enter. Then it will show you the changes it will make to your codebase and ask for confirmation. As long as it looks good, you can hit Enter, once again. For my projects, I also like to have a consistent theme, so I keep a file in my GitHub to copy and paste styles from. Obviously, if you want your own theme, you can ignore this step, but if you want your project to look as amazing as mine, copy the styles from this file on GitHub into the /src/global.css file. You can replace the old styles, but leave the Tailwind directives in place. Prepare Homepage The last thing we’ll do today to get the project to a good starting point is make some changes to the homepage. This means making changes to /src/routes/index.tsx. By default, this file starts out with some very basic text and an example for modifying the HTML <head> by exporting a head variable. The changes I want to make include: Removing the head export Removing all text except the <h1>; Feel free to add your own page title text. Adding some Tailwind classes to center the content and make the <h1> larger Wrapping the content with a <main> tag to make it more semantic Adding Tailwind classes to the <main> tag to add some padding and center the contents These are all minor changes that aren’t strictly necessary, but I think they will provide a nice starting point for building out our app in the next post. Here’s what the file looks like after my changes. import { component$ } from "@builder.io/qwik"; export default component$(() => { return ( <main class="max-w-4xl mx-auto p-4"> <h1 class="text-6xl">Hi [wave emoji]</h1> </main> ); }); And in the browser, it looks like this: Conclusion That’s all we’ll cover today. Again, this post was mostly focused on getting the boilerplate stuff out of the way so that the next post can be dedicated to integrating OpenAI’s API into our project. With that in mind, I encourage you to take a moment to think about some AI app ideas that you might want to build. There will be a lot of flexibility for you to put your own spin on things. I’m excited to see what you come up with, and if you would like to explore the code in more detail, I’ll post it on my GitHub account.
Tuhin Chattopadhyay
CEO and Professor,
Tuhin AI Advisory
Thomas Jardinet
IT Architect,
Rhapsodies Conseil
Sibanjan Das
Zone Leader,
DZone
Tim Spann
Principal Developer Advocate,
Cloudera