跨分发使用Python

我设计数据模拟器，其基于限制的一些记录当天的随机数据记录，限制可以从100什么10000跨分发使用Python

limit = 100

的记录应该是分布式整个一天例如：第0小时记录的15％，第1小时的20％，第2小时的5％等等......

如何使用python模拟这种分布，哪个库可以帮助设计逻辑？

现在我能够模拟像下面

t_id t_amount gateway transaction_date 
101  30   Master  11/05/2016 
102  10   Amex  11/05/2016

如果你看一下交易日，它没有时间戳记录。但是我想要有如下记录的时间戳，所有100条记录都是全天分布的，如何实现？

t_id t_amount gateway transaction_date 
101  30   Master  11/05/2016 00:21:42 
102  10   Amex  11/05/2016 01:22:42

来源

2016-05-30 Maverick

你能展示当前使用的代码吗？这两件遗失的作品是分布在小时内的唱片发行和时间戳是否正确？ –

下面是按照您所描述的方式生成某些内容的一种方法。请注意，limit可以是随机的，也可以是每小时的重量。

In [78]: df.tail() 
Out[78]: 
        gateway t_amount t_id 
transaction_date 
2016-11-05 03:00:00 Amex  68 195 
2016-11-05 03:00:00 Amex  41 196 
2016-11-05 03:00:00 Master  66 197 
2016-11-05 03:00:00 Amex  59 198 
2016-11-05 03:00:00 Amex  45 199

下面的代码pregenerates给出每小时观测limit和权重的期望数量的小时数。然后它使用Numpy的随机模块生成样本数据。查看他们的documentation其他发行版。

import numpy as np 
import pandas as pd 

#total number of observations: 
limit = 10**2 

#percent of transactions during that hour. 
weights_per_hour= (np.array([.35, .25, .25, .15])*limit).astype(int) 

#generate time range using Pandas datetime functions 
time_range = pd.date_range(start = '20161105',freq='H', periods=4) 

#generate data index according to the hour distribution. 
time_indx = time_range.repeat(weights_per_hour) 

#create temp data frame as a housing unit. 
dat_dict = {"t_id":[x+100 for x in range(N)], "transaction_date":time_indx} 
temp_df = pd.DataFrame(dat_dict) 

#enter the choices for transaction type 
gateway_choice = np.array(['Master', 'Amex']) 

#generate random data 
rnd_df = pd.DataFrame({"t_amount":np.random.randint(low=1, high=100,size=limit), "gateway":np.random.choice(gateway_choice,limit)}) 

#attach random data to to temp_df 
df = pd.concat([rnd_df, temp_df], axis=1) 
df.set_index('transaction_date', inplace=True)

在上面的代码中，索引是时间戳格式。你可能不得不四处打印它，但它肯定存储。要将其转换为Pandas非索引格式，请使用pd.index.to_datetime()和df.reset_index(df.index)将其放入数据帧中。

来源

2016-05-30 17:51:44

我你看random包，它是标准库的一部分的文档，你会发现，它支持生成数字与正常（高斯）分布。

来源

2016-05-30 17:08:13 sorin

跨分发使用Python

回答

相关问题