Pearson Correlation Coefficient calculation in C++

#include <cmath> #include <cstdio> #include <vector> #include <iostream> #include <algorithm> #include <iomanip> using namespace std; double sum(vector<double> a) { double s = 0; for (int i = 0; i < a.size(); i++) { s += a[i]; } return s; } double mean(vector<double> a) { return sum(a) / a.size(); } double sqsum(vector<double> a) { double s = 0; for (int i = 0; i < a.size(); i++) { s += pow(a[i], 2); } return s; } double stdev(vector<double> nums) { double N = nums.size(); return pow(sqsum(nums) / N - pow(sum(nums) / N, 2), 0.5); } vector<double> operator-(vector<double> a, double b) { vector<double> retvect; for (int i = 0; i < a.size(); i++) { retvect.push_back(a[i] - b); } return retvect; } vector<double> operator*(vector<double> a, vector<double> b) { vector<double> retvect; for (int i = 0; i < a.size() ; i++) { retvect.push_back(a[i] * b[i]); } return retvect; } double pearsoncoeff(vector<double> X, vector<double> Y) { return sum((X - mean(X))*(Y - mean(Y))) / (X.size()*stdev(X)* stdev(Y)); } int main() { /* Enter your code here. Read input from STDIN. Print output to STDOUT */ int N; cin >> N; vector<double> X(N); vector<double> Y(N); for (int i = 0; i < X.size() ; i++) { cin >> X[i]; } for (int i = 0; i < Y.size(); i++) { cin >> Y[i]; } cout << fixed << setprecision(3) << pearsoncoeff(X, Y) << endl; return 0; }

2 Responses

1) Passing vectors by const reference would be more efficient, i.e. use const vector<double>& instead of vector<double> as arguments. You will avoid copying the entire vector that way.
2) push_back() might be expensive because it causes automatic reallocation when vector size surpasses its capacity. You may construct retvect using the input, i.e. vector<double> retvect(a). It will make retvect the same size as a and also will initialize it with the same values.
3) Avoid duplicate function calls. You call sum() both in mean() and stdev() functions, i. e. pearsoncoeff calls sum() twice for both X and Y.
1) Passing vectors by const reference would be more efficient, i.e. use const vector<double>& instead of vector<double> as arguments. You will avoid copying the entire vector that way.
2) push_back() might be expensive because it causes automatic reallocation when vector size surpasses its capacity. You may construct retvect using the input, i.e. vector<double> retvect(a). It will make retvect the same size as a and also will initialize it with the same values.
3) Avoid duplicate function calls. You call sum() both in mean() and stdev() functions, i. e. pearsoncoeff calls sum() twice for both X and Y.

Write a comment

You can use [html][/html], [css][/css], [php][/php] and more to embed the code. Urls are automatically hyperlinked. Line breaks and paragraphs are automatically generated.