1
我有下面的代码,我正在使用python编写一个简单的电影推荐器的一部分,这样我就可以模拟我作为coursera由Andrew NG教授的机器学习课程的一部分获得的结果。在从熊猫数据框转换后修改numpy数组
我想修改我呼吁大熊猫数据框as_matrix()后,即可获取numpy.ndarray和列向量添加到它就像我们可以在MATLAB
Y = [ratings Y]
以下是我的Python代码
dataFile='/filepath/'
userItemRatings = pd.read_csv(dataFile, sep="\t", names=['userId', 'movieId', 'rating','timestamp'])
movieInfoFile = '/filepath/'
movieInfo = pd.read_csv(movieInfoFile, sep="|", names=['movieId','Title','Release Date','Video Release Date','IMDb URL','Unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western'], encoding = "ISO-8859-1")
userMovieMatrix=pd.merge(userItemRatings, movieInfo, left_on='movieId', right_on='movieId')
userMovieSubMatrix = userMovieMatrix[['userId', 'movieId', 'rating','timestamp','Title']]
Y = pd.pivot_table(userMovieSubMatrix, values='rating', index=['movieId'], columns=['userId'])
Y.fillna(0,inplace=True)
movies = Y.shape[0]
users = Y.shape[1] +1
ratings = np.zeros((1682, 1))
ratings[0] = 4
ratings[6] = 3
ratings[11] = 5
ratings[53] = 4
ratings[63] = 5
ratings[65] = 3
ratings[68] = 5
ratings[97] = 2
ratings[182] = 4
ratings[225] = 5
ratings[354] = 5
features = 10
theta = pd.DataFrame(np.random.rand(users,features))# users 943*3
X = pd.DataFrame(np.random.rand(movies,features))# movies 1682 * 3
X = X.as_matrix()
theta = theta.as_matrix()
Y = Y.as_matrix()
"""want to insert a column vector into this Y to get a new Y of dimension
1682*944, but only seeing 1682*943 after the following statement
"""
np.insert(Y, 0, ratings, axis=1)
R = Y.copy()
R[R!=0] = 1
Ymean = np.zeros((movies, 1))
Ynorm = np.zeros((movies, users))
for i in range(movies):
idx = np.where(R[i,:] == 1)[0]
Ymean[i] = Y[i,idx].mean()
Ynorm[i,idx] = Y[i,idx] - Ymean[i]
print(type(Ymean), type(Ynorm), type(Y), Y.shape)
Ynorm[np.isnan(Ynorm)] = 0.
Ymean[np.isnan(Ymean)] = 0.
插入了一个内嵌评论,但我的问题是当我创建一个新的numpy数组并调用insert时,它工作得很好。然而,在调用pivot_table()的pandas数据框上调用as_matrix()后,我得到的numpy数组无效。有其他选择吗?