HostJupyter
TOOLS

Jupyter Notebook Publisher

Publish and share beautiful Jupyter notebooks

#4fb4d7

Theme mode

Light mode

Dark mode

Code highlighter

An old hope

Notebook title

Author name

Upload your notebook file to finish

📖 Notebook

Let's filter the dataset further, so that 45% are one-worded foods, 30% are two-worded, and 25% are three-worded.

# shuffle the 2-worded and 3-worded foods since we'll be slicing them
two_worded_foods = two_worded_foods.sample(frac=1)
three_worded_foods = three_worded_foods.sample(frac=1)

# append the foods together 
foods = one_worded_foods.append(two_worded_foods[:round(total_num_foods * 0.30)]).append(three_worded_foods[:round(total_num_foods * 0.25)])

# print the resulting sizes
for i in range(3):
    print(f"{i+1}-worded food entities:", foods[foods.str.split().apply(len) == i + 1].size)

1-worded food entities: 1258 2-worded food entities: 839 3-worded food entities: 699

Copyright © 2021 HostJupyter

Copyright © 2021 HostJupyter