b7990885d8b26b9404fd9ce952b0b2f005019594,california_housing/feature_engineering.py,,,#,23

Before Change



//make a stratified split of the data
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(housing, housing["income_cat"]):
	train_set = housing.loc[train_index]
	test_set = housing.loc[test_index]

After Change


//////////////////

city_lat_long = pd.read_csv("cal_cities_lat_long.csv")
city_pop_data = pd.read_csv("cal_populations_city.csv")
county_pop_data = pd.read_csv("cal_populations_county.csv")



original, had to change because we only want to deal with cities we have
both location and population data on.

city_coords = {}
for dat in city_lat_long.iterrows():
    row = dat[1]
    city_coords[row["Name"]] = (float(row["Latitude"]), float(row["Longitude"]))

//how we deiscovered the need for the change
present = []
absent = []
for city in city_coords.keys():
    if city in city_pop_data["City"].values:
        present.append(city)
    else:
        absent.append(city)
len(present)
len(absent)
absent


city_coords = {}

for dat in city_lat_long.iterrows():
    row = dat[1]
    if row["Name"] not in city_pop_data["City"].values:   
        continue           
    else: 
        city_coords[row["Name"]] = (float(row["Latitude"]), float(row["Longitude"]))

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 4

Instances

Link

Project Name: CNuge/kaggle-code

Commit Name: b7990885d8b26b9404fd9ce952b0b2f005019594

Time: 2018-01-12

Author: nugentc@uoguelph.ca

File Name: california_housing/feature_engineering.py

Class Name:

Method Name:

Link

Project Name: broadinstitute/gtex-pipeline

Commit Name: 080080a547e9d89adf4393c2a349544443c35962

Time: 2017-08-18

Author: francois@broadinstitute.org

File Name: rnaseq/src/aggregate_rnaseqc_metrics.py

Class Name:

Method Name:

Link

Project Name: QUANTAXIS/QUANTAXIS

Commit Name: 9d5565affe6314056373bf789868e8db714a3da8

Time: 2020-08-04

Author: yutiansut@qq.com

File Name: QUANTAXIS/QAFetch/QATdx.py

Class Name:

Method Name: QA_fetch_get_stock_list