Why are the main components used in the application of the conversion to baseline data being negative?



  • Data of 374 lines x 31 columns. The first column is the date, the remaining stock price columns of 30 companies. I need to use the main component method. For that, I wrote the following code:

    import numpy as np
    import pandas as pd
    Location1 = r'C:\Users\...\close_prices.csv'
    df = pd.read_csv(Location1)
    from sklearn.decomposition import PCA
    X = df.drop('date', 1)
    pca = PCA(n_components=10)
    pca.fit(X)
    print(pca.explained_variance_ratio_)
    # первая компонента объясняет больше всего вариации признаков (цены 30-ти компаний)
    # теперь применяю преобразование к исходным данным
    X1 = pca.transform(X)
    X1
    Out[7]:
    array([[-50.90240358, -17.63167724,  -7.7360209 , ...,   3.55657041,
         -5.82197358,  -1.72604005],
       [-52.84690919, -19.14690749,  -7.27254551, ...,   3.43259929,
         -5.63318106,  -2.0122316 ],
    X1.shape
    # (374, 10)
    # необходимо взять первую компоненту и рассчитать коэфициент корреляции Пирсона для Индекса Доу Джонса размерностью (374, 1) => я беру (374, 1)
    X11 = X1[:,[0]]
    X11.shape
    # (374,1)
    

    But I can't count the coefficient because the numbers are negative in X1. So when the roots are taken and the matrix is divided, it's nan.

    Why, after applying the trained model to X, is a negative matrix?



  • What's stopping you from multiplying the score by one? PCA sets out the directions in the area, while the orientation of its own vectors seeking these directions does not play a special role.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2