When the mark is fixed, the electrophilicity from the molecules is stronger, as well as the log (IC50) worth is greater. operator, polygene chromosome as well as the contiguous function, the typical function established and users-defined features predicated on the regular function established, and fitness function selection (Desk 2). A couple of three types of fitness features for the traditional GEP method, which paper adopts the fitness function predicated on the overall error: may be the selection range, for fitness case (out of fitness situations), and may be the focus on worth for fitness case 02 +?+ 001.2612+ 02?1.5463 1LUMOLUMO energy5.0431? 014.8720 2MRECOMin resonance energy for the CCO connection?3.6715+ 006.7200? 01?5.46353KSINDKier shape index (order 3)?2.0681? 017.7119? 02?2.6816 4ZXZX Shadow/ZX Rectangle?7.0757+ 002.1621+ 00?3.2726 5MASEOATMin atomic condition energy for the O atom8.4808? 014.3585? 011.9458 Open up in a separate window Table 4 Correlation matrix of the 5 descriptors. = 20.60, and = 0.23.? Test set: = 21.13, and = 0.36. Open in a separate window Figure 2 Plot of predicted log (IC50) versus experimental values for the training and test sets by HM. 3.2. Calculation Results of GEP After the establishment of the linear model, the same descriptors, as the variables of GEP, establish the nonlinear model. In order to obtain satisfactory results, the parameters affecting the GEP are optimized. Automatic problem solver (APS), the software package used by GEP, is easy to control, and therefore, the evolutionary model can be tested by the test set. In the course of evolution, good selection has been made for the functions with 7 functions selected, namely, subtract, multiply, divide, index, sin, Parathyroid Hormone (1-34), bovine and tan and the fitting function is MSE. Through fitting, the five descriptors selected establish the best QSAR model with the prediction values and residua listed in Table 1 and Figures ?Figures33 and ?and4.4. The nonlinear QSAR model by the GEP is gained as follows: ? double dblTemp = 0.0,? dblTemp = sin (tan((tan (d[1])/sin (d[4])))),? dblTemp += sin (sin(((tan (d[1])/d[0])-d[3]))),? dblTemp += d[0],? dblTemp += pow (d[4],(pow (d[4],d[0])/d[2])),? dblTemp += sin (sqrt((d[2]-tan (sin(tan((d[2]* ? 7.653931))))))), Open in a separate window Figure 3 Plot of predicted log (IC50) versus experimental values for the training sets by GEP. Open in a separate window Figure 4 Plot of predicted log (IC50) versus experimental values for the test sets by GEP. where d[0], d(1), d(2), d(3), and d(4) represent LUMO, MRECO, KSIND, ZX, and MASEOAT, respectively. The statistical results of the established models are ? Training set: = 0.12;? Test set: = 3.95. 3.3. Discussions on Relevant Descriptor in the Model By interpreting the model descriptors, the structural features affecting the log (IC50) values of these compounds may be identified. In the five parameters of the model selected, LUMO, MRECO, and MASEOAT are quantum chemistry descriptors; KSIND is a topological descriptor; and ZX is a geometric descriptor. The marshalling sequence of the descriptors in the equation shows that the contribution of the descriptor to log (IC50) of the compound is in the order of LUMO MRECO KSIND ZX MASEOAT. LUMO reflects the electron affinity of the molecule [28], with the coefficient positive in the model. When the target is fixed, the electrophilicity of the molecules is stronger, and the log (IC50) value is greater. When em R /em 3 side chain is the aliphatic chain, the longer the chain, the greater the LUMO value, and the compound inhibition of enzyme activity of MMP-2 and MMP-9 will be increasing; the aromatics substituent is obviously stronger than the aliphatic substituent in side chain activity, which may be resulting from the large conjugation system of the aromatic ring, increasing the LUMO value with stronger inhibition rate on the gelatinase activity. Generally, the substituent compound with branched chains is greater than that with a ring substituent, which means that the carbonyl reaction activity with open loop structure is stronger. MRECO represents the minimum resonance energy of the CCO bond [29]. With the increase of the substituent, the three series of A, B, and C compounds keep an overall downward trend. The smaller the value, the lower the minimum resonance energy of the CCO bond, and the molecule is in a relatively stable state, highly reactive, and easy for the target combination. As its coefficient in the model is negative, with the decreasing of the MRECO, the value of log (IC50) is gradually increased. KSIND represents the three connectivity indexes of the molecule [30], represents the molecule size, shape, and degree of branching, and reflects the dispersion force between the molecule volume and the molecules to a certain extent. The larger the molecule volume, the greater the molecule dispersion force. Table 2 shows that the KSIND value increases along with the increase of the atom number.