b7990885d8b26b9404fd9ce952b0b2f005019594,california_housing/feature_engineering.py,,,#,23

Before Change


housing["income_cat"].hist()

//make a stratified split of the data
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(housing, housing["income_cat"]):
	train_set = housing.loc[train_index]
	test_set = housing.loc[test_index]

for set_ in (train_set, test_set):
	set_.drop("income_cat", axis=1, inplace=True)

gc.collect()

After Change


// passing data in for imputation and one hot encoding
//////////////////

city_lat_long = pd.read_csv("cal_cities_lat_long.csv")
city_pop_data = pd.read_csv("cal_populations_city.csv")
county_pop_data = pd.read_csv("cal_populations_county.csv")

Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances


Project Name: CNuge/kaggle-code
Commit Name: b7990885d8b26b9404fd9ce952b0b2f005019594
Time: 2018-01-12
Author: nugentc@uoguelph.ca
File Name: california_housing/feature_engineering.py
Class Name:
Method Name:


Project Name: biocore/scikit-bio
Commit Name: 95ae997f265381950b8597ab6636574564e7ef01
Time: 2015-10-07
Author: kestrel.gorlick@gmail.com
File Name: skbio/io/format/blast6.py
Class Name:
Method Name: _blast6_to_data_frame


Project Name: oddt/oddt
Commit Name: e626254b74ecb6dc71396c1b35237b53a5e35163
Time: 2017-08-23
Author: maciek@wojcikowski.pl
File Name: oddt/datasets.py
Class Name: pdbbind
Method Name: __init__