Northamptonshire Senior Cup Predictions

Analyzing team formations and tactics is crucial for understanding potential match outcomes. Here’s a breakdown of what to expect from each team:

Kettering Town's Defensive Strategy

Kettering Town typically employs a robust defensive formation, often opting for a back five when playing against strong opponents like Corby Town. Their strategy focuses on maintaining a solid defensive line and exploiting counter-attacks.

Corby Town's Offensive Playstyle

In contrast, Corby Town prefers an aggressive attacking formation, usually deploying a front three to press high and dominate possession. Their success hinges on quick transitions and exploiting spaces left by opponents.

This tactical clash promises an intriguing battle between defense and attack.

No football matches found matching your criteria.

<|repo_name|>kaitainikai/FoodIE<|file_sep|>/code/data_processing/clean_data.py # -*- coding: utf-8 -*- """ Created on Thu Apr-23-15 @author: zhiyu """ import pandas as pd import numpy as np from sklearn.preprocessing import MultiLabelBinarizer from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer from nltk.stem import WordNetLemmatizer # load data train = pd.read_csv('data/train.csv') test = pd.read_csv('data/test.csv') # drop missing data train = train[pd.notnull(train['text'])] test = test[pd.notnull(test['text'])] # remove duplicate data train.drop_duplicates(subset='text', inplace=True) test.drop_duplicates(subset='text', inplace=True) # merge train and test data all_data = pd.concat([train,test]) print(all_data.shape) # feature extraction stemmer = PorterStemmer() lemmatizer = WordNetLemmatizer() stop_words = set(stopwords.words('english')) def clean_text(text): text = text.lower() text = ' '.join([stemmer.stem(word) for word in text.split() if word not in stop_words]) text = ' '.join([lemmatizer.lemmatize(word) for word in text.split()]) return text all_data['text'] = all_data['text'].apply(clean_text) print(all_data['text'].head()) # label encoding mlb = MultiLabelBinarizer() all_labels = list(set([item for sublist in all_data['cuisine'].values.tolist() for item in sublist])) mlb.fit(all_labels) label_list = mlb.transform(all_data['cuisine'].values.tolist()) label_df = pd.DataFrame(label_list, columns=all_labels) all_data = pd.concat([all_data,label_df], axis=1) # split train and test data again train_df = all_data[all_data['id'].isin(train.id)] test_df = all_data[all_data['id'].isin(test.id)] # save clean data train_df.to_csv('data/train_clean.csv', index=False) test_df.to_csv('data/test_clean.csv', index=False)<|repo_name|>kaitainikai/FoodIE<|file_sep|>/code/experiment.py import numpy as np import pandas as pd import pickle from sklearn.metrics import roc_auc_score def auc(y_true,y_pred): auc_score_list=[] for i in range(len(y_true[0,:])): auc_score_list.append(roc_auc_score(y_true[:,i],y_pred[:,i])) return np.mean(auc_score_list) def get_prediction_results(test_features,test_labels,model_file): with open(model_file,'rb') as f: clf=pickle.load(f) test_predictions=clf.predict_proba(test_features) # print(test_predictions.shape) return auc(test_labels,test_predictions),test_predictions def get_all_results(): """ get all results """ print('======ensemble results====') results={} for model_type in ['svm','rf']: for feature_type in ['bow','tfidf']: for ngram_range in [(1,1),(1,2)]: for kernel_type in ['linear','rbf']: if model_type=='svm' and feature_type=='tfidf' and ngram_range==(1,2) and kernel_type=='rbf': continue file_name='model_'+model_type+'_'+feature_type+'_'+str(ngram_range[0])+'-'+str(ngram_range[1])+'_'+kernel_type+'.sav' print(file_name) results[file_name]=get_prediction_results(test_features,test_labels,file_name) return results def get_best_result(results): """ get best result """ best_result=0. best_model=None for model_file,auc_value in results.items(): if auc_value > best_result: best_result=auc_value best_model=model_file return best_model,best_result if __name__ == '__main__': train_features=pd.read_csv('../data/features/train_features.csv') test_features=pd.read_csv('../data/features/test_features.csv') y_train=train_features.iloc[:,100:].values y_test=test_features.iloc[:,100:].values test_features=test_features.iloc[:,:100].values train_features=train_features.iloc[:,:100].values test_labels=y_test results=get_all_results() best_model,best_result=get_best_result(results) print('best model:',best_model) print('best auc score:',best_result)<|repo_name|>kaitainikai/FoodIE<|file_sep|>/code/data_processing/build_feature.py # -*- coding: utf-8 -*- """ Created on Mon Apr-27-15 @author: zhiyu """ import pandas as pd from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer # load data train_df = pd.read_csv('data/train_clean.csv') test_df = pd.read_csv('data/test_clean.csv') # initialize vectorizer bow_vectorizer = CountVectorizer(max_features=10000) tfidf_vectorizer = TfidfVectorizer(max_features=10000) # fit train data into vectorizer bow_vectorizer.fit(train_df['text']) tfidf_vectorizer.fit(train_df['text']) # transform train and test data into feature vectors using bow vectorizer train_bow_vector=bow_vectorizer.transform(train_df['text']) test_bow_vector=bow_vectorizer.transform(test_df['text']) train_bow_feature=pd.DataFrame(train_bow_vector.todense(),columns=bow_vectorizer.get_feature_names()) test_bow_feature=pd.DataFrame(test_bow_vector.todense(),columns=bow_vectorizer.get_feature_names()) train_bow_feature.to_csv('data/features/train_bow_feature.csv',index=False) test_bow_feature.to_csv('data/features/test_bow_feature.csv',index=False) # transform train and test data into feature vectors using tfidf vectorizer train_tfidf_vector=tfidf_vectorizer.transform(train_df['text']) test_tfidf_vector=tfidf_vectorizer.transform(test_df['text']) train_tfidf_feature=pd.DataFrame(train_tfidf_vector.todense(),columns=tfidf_vectorizer.get_feature_names()) test_tfidf_feature=pd.DataFrame(test_tfidf_vector.todense(),columns=tfidf_vectorizer.get_feature_names()) train_tfidf_feature.to_csv('data/features/train_tfidf_feature.csv',index=False) test_tfidf_feature.to_csv('data/features/test_tfidf_feature.csv',index=False)<|file_sep|># FoodIE ## Data description The dataset consists of more than one million recipes scraped from Internet websites such as AllRecipes.com. The main file train.json contains information about recipe title, ingredients list (with amounts), preparation instructions (if available) and cuisine type. The file test.json contains information about recipe title and ingredients list (with amounts). The goal is to predict cuisine type given only ingredients list. ## Requirements * Python==2.x * scikit-learn==0.x.x * pandas==0.x.x * numpy==x.x.x * nltk==x.x.x ## Preprocessing ### clean data #### Preprocessing steps: * drop missing values; * drop duplicate values; * clean text (lower case, stemming, lemmatization); * encode labels; * split train and test set again; #### Usage: python clean_data.py ### build features #### Features: * bag-of-word features; * tf-idf features; #### Usage: python build_feature.py ## Experimentation ### experiment steps: * use SVM or RF as classifier; * use bag-of-word or tf-idf features; * use different ngram ranges (unigram or bigram); * use different kernels (linear or rbf); * calculate AUC scores; ### Usage: python experiment.py ### Results: SVM performs better than RF.
Bag-of-word features perform better than tf-idf features.
Unigram performs better than bigram.
Linear kernel performs better than rbf kernel.
The best result:
Best model: model_svm_bow_1-1_linear.sav
Best auc score: **0.787793014907**
## Ensemble methods ### experiment steps: * use SVM or RF as classifier; * use bag-of-word or tf-idf features; * use different ngram ranges (unigram or bigram); * use different kernels (linear or rbf); * calculate AUC scores; ### Usage: python ensemble.py ### Results: SVM performs better than RF.
Bag-of-word features perform better than tf-idf features.
Unigram performs better than bigram.
Linear kernel performs better than rbf kernel.
The best result:
Best model: model_svm_bow_ensemble.sav
Best auc score: **0.790562828402**
## Future work: ### Use neural network as classifier: The most common way of using neural network as classifier is multilayer perceptron (MLP). It is a feedforward neural network which consists of multiple layers between input layer and output layer. There are two common types of MLP: (1) Fully connected MLP:
Each node from layer L connects with all nodes from layer L+1.
![Fully connected MLP](https://github.com/zhiyucheng/FoodIE/blob/master/img/mlp_fully_connected.png) (2) Convolutional neural network (CNN):
It uses convolutional layers instead of fully connected layers.
![Convolutional neural network](https://github.com/zhiyucheng/FoodIE/blob/master/img/mlp_convolutional.png) For detailed information about neural network architectures, please refer [here](http://www.wildml.com/2015/11/building-neural-networks-using-python-numpy-and-theano/) ### Use word embedding instead of bag-of-word or tf-idf features: Word embedding is another way of representing words which maps each word into a real-valued vector space. One popular method is word2vec which uses shallow neural networks. For detailed information about word embedding methods, please refer [here](http://www.wildml.com/2016/04/word2vec-nlp-tutorial-part-1-introduction-to-word-embedding/) <|repo_name|>kaitainikai/FoodIE<|file_sep|>/code/ensemble.py import numpy as np import pandas as pd import pickle from sklearn.metrics import roc_auc_score def auc(y_true,y_pred): auc_score_list=[] for i in range(len(y_true[0,:])): auc_score_list.append(roc_auc_score(y_true[:,i],y_pred[:,i])) return np.mean(auc_score_list) def get_prediction_results(test_features,test_labels,model_files): test_predictions=[] for model_file in model_files: with open(model_file,'rb') as f: clf=pickle.load(f) test_predictions.append(clf.predict_proba(test_features)) test_predictions=np.array(test_predictions) test_predictions=np.mean(test_predictions,axis=0) return auc(test_labels,test_predictions),test_predictions def get_all_results(): """ get all results """ print('======ensemble results====') results={} model_types=['svm','rf'] feature_types=['bow','tfidf'] ngram_ranges=[(1,1),(1,2)] kernel_types=['linear','rbf'] for model_type in model_types: for feature_type in feature_types: for ngram_range in ngram_ranges: if feature_type=='tfidf' and ngram_range==(1,2): continue model_files=[] if model_type=='svm': kernel_types=['linear','rbf'] else: kernel_types=['gini'] for kernel_type in kernel_types: file_name='model_'+model_type+'_'+feature_type+'_'+str(ngram_range[0])+'-'+str(ngram_range[1])+'_'+kernel_type+'.sav' model_files.append(file_name) print(model_files) results[model_files]=get_prediction_results(test_features,test_labels,model_files) return results def get_best_result(results):

Northamptonshire Senior Cup stats & predictions

Exciting Matches in the Northamptonshire Senior Cup: Tomorrow's Highlights

Matchday Schedule

In-Depth Analysis: Kettering Town vs. Corby Town

Betting Predictions

The Classic Clash: Northampton Town vs. Rushden & Diamonds

Betting Insights

Afternoon Thrill: Brackley Town vs. Towcester Town

Betting Overview

Dusk Drama: AFC Wrexham vs. Peterborough United

Betting Forecasts

Tactical Insights and Team Formations

Kettering Town's Defensive Strategy

Corby Town's Offensive Playstyle