请考虑下列表格。加入4个表格中的数据以计算几个加权分数
users
拥有数以万计的Twitter用户;他们的tweets
索引sp100_id
,这是公司的id(请参阅sp100
)鸣叫正在谈论。 tweets.class
为每条推文保留指定的情绪类(1
=中性,2
=正数,3
=负数)。 tweets.rt
保存推文已被转推的次数。最后,每个用户被赋予一个quality
分数和follow
评分,如下:
users tweets
------------------------- -----------------------------------------------
user_id quality follow tweet_id sp100_id nyse_date user_id class rt
------------------------- -----------------------------------------------
1 2.50 5.00 1 1 2011-03-12 1 1 0
2 0.75 1.00 2 1 2011-03-13 1 2 2
3 1 2011-03-13 1 2 1
daterange 4 1 2011-03-13 2 2 0
---------------- 5 1 2011-03-13 2 3 3
_date 6 2 2011-03-12 2 2 3
---------------- 7 2 2011-03-12 2 2 0
2011-03-11 8 2 2011-03-12 1 3 5
2011-03-12 9 2 2011-03-13 2 2 0
2011-03-13
sp100
----------------
sp100_id _name
----------------
1 Alcoa
2 Apple
所需的输出是每sp100_id
列表每_date
的每加权阳性(class=2
)和负极(class=3
)鸣叫的量rt
,“质量”和follow
:
sp100_id nyse_date pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1 2011-03-11 0 0 0 0 0 0
1 2011-03-12 0 0 0 0 0 0
1 2011-03-13 5 (1) 5.75 (2) 11.00 (3) 3 (4) 0.75 (5) 1.00 (6)
2 2011-03-11 0 0 0 0 0 0
2 2011-03-12 3 (7) 5.00 (8) 10.00 (9) 5.00 2.50 2.50
2 2011-03-13 0 0.75 1.00 0 0 0
--------------------------------------------------------------------------------
(1) On 2011-03-13, 3 positive tweets for sp100_id 1. 1 tweet retweeted 2 times,
1 tweets retweeted 1 time and 1 tweet retweeted 0 times = 2x2+1x1+1x0 = 5
(2) On 2011-03-13, 2 positive tweets made by user 1, who has quality 2.50 and
1 positive tweet made by user 2, who has quality 0.75 = 2x2.50+1x0.75 = 5.75
(3) On 2011-03-13, 2 positive tweets made by user 1, who has follow 5.00 and
1 positive tweet made by user 2, who has follow 1 = 2x5.00+1x1.00 = 11.00
(4) On 2011-03-13, 1 negative tweet made by user 2, retweeted 3 times = 1x3 = 3
(5) On 2011-03-13, 1 negative tweet made by user 2, who has quality 0.75, thus
1x0.75 = 0.75
(6) On 2011-03-13, 1 negative tweets made by user 2, who has follow 1.00 so
1x1.00 = 1.00
(7) 1 positive tweet which has been retweeted 3 times, 1 positive tweet without
any retweets = 1x3+1x0 = 3
(8) 2 positive tweets from user 2 x quality 2.50 = 5.00
(9) 2 positive tweets x follow 5 = 10.00
我试图解释自己尽可能好。谁可以帮助我构建正确的查询?正如你所看到的,还有没有推文(所有值为零)的日期,都需要包含在结果集中。我现在有这一点,但我有麻烦整理休息:通过正确的语法来代替
SELECT
s.sp100_id,
d._date,
COALESCE(c.pos-rt,0) AS pos-rt,
COALESCE(c.pos-quality,0) AS pos-quality,
COALESCE(c.pos-follow,0) AS pos-follow,
COALESCE(c.neg-rt,0) AS neg-rt,
COALESCE(c.neg-quality,0) AS neg-quality,
COALESCE(c.neg-follow,0) AS neg-follow
FROM sp100 s
CROSS JOIN daterange d
LEFT JOIN (
SELECT
sp100_id,
nyse_date,
COUNT(CASE class WHEN 2 THEN 1 END) * [rt] AS pos-rt,
COUNT(CASE class WHEN 2 THEN 1 END) * [quality] AS pos-quality,
COUNT(CASE class WHEN 2 THEN 1 END) * [follow] AS pos-follow,
COUNT(CASE class WHEN 3 THEN 1 END) * [rt] AS neg-rt,
COUNT(CASE class WHEN 3 THEN 1 END) * [quality] AS neg-quality,
COUNT(CASE class WHEN 3 THEN 1 END) * [follow] AS neg-follow
FROM tweets
GROUP BY sp100_id, nyse_date
) c ON s.sp100_id = c.sp100_id AND d._date = c.nyse_date
ORDER BY s.sp100_id, d._date ASC
显然,[rt]
,[quality]
和[follow]
需要,我不知道的COUNT(...)
要么,因为它现在第一计数推文的数量,但它应该把每一条推文分开,并乘以它自己的转推数('rt')。
有人可以帮我吗?
有一些问题了解你的表脚注(1):第一鸣叫转推了两次;为什么它对'pos-rt' 2 * 2而不是1 * 2的贡献,而另外两个推文(retweted一次和零次)分别贡献1 * 1和1 * 0? – eggyal 2012-07-31 17:30:07
在脚注(8)中,我认为相关用户拥有'user_id = 2'且质量= 0.75,因此'pos-rt'应该是'1.5'?同样,对于脚注(9)'follow = 1.00',因此'pos-follow'应该是'2.00'? – eggyal 2012-07-31 17:45:44
你在这两个帐户都是正确的:-) – Pr0no 2012-07-31 20:09:34