Data analysis research methods

  • Univariate analysis is the simplest form of data analysis in which the data being analyzed contains only one variable. Because it is a single variable, it does not deal with causes or relationships. Corresponding to univariate analysis is multivariate analysis, in addition, there are a series of analysis methods such as anova, principal component analysis and correlation analysis.


    The purpose of univariate analysis is to depict and describe the characteristics and laws of variable distribution by sorting, processing, organizing and displaying data, calculating the indicators reflecting the central tendency and dispersion degree of data. Of course, you need to use different methods and metrics for different types of variables. Variables can be regarded as the categories of data. For example, in univariate analysis, one variable is age and the other variable is height. Univariate analysis cannot observe the two variables at the same time or the relationship between them.


    Patterns found in univariate data are: look at mean, pattern, median, range, variance, maximum, minimum, quartile, and standard deviation. In addition, some methods for displaying univariate data include frequency distribution tables, histograms, frequency polygons, and pie charts.


    Multivariate analysis is the analysis of three or more variables. Depending on your goal, there are several ways to perform multivariate analysis, some of which include additive tree analysis, canonical correlation analysis, clustering analysis, multiple correspondence analysis, factor analysis, generalized Procrustean analysis, MANOVA, multidimensional scaling, multivariate regression, partial least squares regression, PARAFAC, and redundancy analysis.


    The so-called analysis of variance is a method to test whether the mean value of the normal population corresponding to each level of factors is equal by decomposing the total sum of squared deviations of observed data and using the hypothesis test theory and method. The main content of this study is the test of the equality of the mean of a normal population and the estimation of unknown parameters in the population distribution.


    PCA is actually an ancient multivariate statistical method. The main function of pca is to reduce the dimensionality of high-dimensional data containing many variables into low-dimensional data with only a few variables, and to minimize or minimize the loss of information. With the rise of machine learning in the new century, principal component analysis (pca), as a major unsupervised learning method, has become increasingly important in both physical and mathematical research. It is worth mentioning that in the empirical analysis of econometrics, the principal component analysis also appears from time to time.


    Correlation analysis is one of the analysis methods frequently used in website analysis. Through the analysis of the relationship between different characteristics or data, the key influence and driving factors in business operation are found, and the business development is predicted. There are many methods for correlation analysis. Primary methods can quickly discover the relationship between data, such as positive correlation, negative correlation or non-correlation. The intermediate method can measure the strength of the relationship between data, such as complete correlation, incomplete correlation and so on. Advanced approaches can transform the relationships between data into models that predict future business development.


    Five correlation analysis methods:


    • the first correlation analysis method is to visualize the data, which is simply to draw charts. It is difficult to find the trend and connection from the perspective of data, but the trend and connection will become clear after the data point is plotted as a graph.


    (2) the second correlation analysis method is to calculate the covariance. Covariance is used to measure the overall error of two variables. If the change trend of two variables is consistent, covariance is positive, indicating that the two variables are positively correlated. If the trend of the two variables is opposite, the covariance is negative, indicating that the two variables are negatively correlated.


    (3) the third correlation analysis method is the correlation coefficient. Correlation coefficient is a statistical indicator of the degree of closeness between reaction variables, and the value range of correlation coefficient is between 1 and -1. 1 means two variables are completely linearly correlated, -1 means two variables are completely negatively correlated, and 0 means two variables are not correlated. The closer the data is to 0, the weaker the correlation.


    (4) the fourth correlation analysis method is regression analysis. Regression analysis is a statistical method to determine the relationship between two or more groups of variables. Regression analysis can be divided into unitary regression and multivariate regression according to the number of variables. Univariate regression was used for two variables and multivariate regression was used for two or more variables. There are two preparations before the regression analysis, the first is to determine the number of variables, and the second is to determine independent variables and dependent variables.


    (5) the last correlation analysis method is information entropy and mutual information. In actual work, there may be many factors affecting the final effect, and not necessarily in numerical form, but some eigenvalues. For example, the city of the user, the gender and age distribution of the user, and whether the user visited the website for the first time are not measured by Numbers.