- Natural Language libraries, e.g. SpaCy, Natural Language Toolkit (NLTK), CoreNLP, TextBlob, scikit-learn NLP toolkit, fastText, Intel NLP Architect, Gensim
- Other Python-related pages:
- TensorFlow for machine learning model building
- PyTorch authored by Facebook
- Google AutoML automatically build and deploy state-of-the-art machine learning models
- Ludwig - a Python toolbox from Uber that allows to train and test deep learning models
- Cython: blending Python and C/C++ ...thus a superset of programming.
- AWS Lambda & Python
- Notebooks; Jupyter and R Markdown
- How to build your own AlphaZero AI using Python and Keras
- Automate the Boring Stuff with Python
- Best Python Resources | Full Stack Python
- Learn Python Programming, By Example | Real Python
- Top 20 Python AI and Machine Learning Open Source Projects
- Essential Cheat Sheets for Machine Learning and Deep Learning Engineers
- How to Setup a Python Environment for Machine Learning | George Seif - KDnuggets
- Git - GitHub and GitLab
- Guido Van Rossum; author of Python
- Sphinx is a tool that makes it easy to create intelligent and beautiful documentation | Georg Brandl
- Python programming language: A cheat sheet | James Sanders - TechRepublic explores what it is used for, how it compares to other languages, and building skills resources
- Quantum Development Algorithms & Kits
- Autonomous Drones
- Code completion: works with the top Python editors: Atom, PyCharm, Sublime,Visual Studio (VS) Code and Vim
- Alternative implementations and extensions of Python to address speed & memory usage...
- Explainable / Interpretable AI
- 1 Python Data Science Handbook
- 2 NumPy
- 3 Pandas
- 4 SciPy
- 5 SymPy
- 6 NetworkX
- 7 scikit-learn
- 8 Graphical User Interface (GUI)
- 9 Spreadsheets
- 10 PyMC3
- 11 StatsModels
- 12 OpenCV
- 13 LibROSA
- 14 PyGame
- 15 Parallel
- 16 Numba
- 17 xarray
- 18 IPython Blocks
- 19 Metaflow
- 20 Web Automation with Python - Data Gathering
- 21 Twisted
- 22 Pipelines
- 23 yellowbrick
- 24 MLxtend
- 25 LIME
- 26 SHAP
- 27 Python Stack
- 28 Time Series
- 29 Visualization with Python
Python Data Science Handbook
- Python Data Science Handbook (Jupyter notebook) | Jake VanderPlas - O'Reilly
- NumPy -manipulation of numerical arrays. NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
- Python Numpy Tutorial | Justin Johnson
- Python Data Analysis library - data structures and data analysis tools for the Python programming language. Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a Pandas DataFrame. Pandas DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.
- Python for Data Analysis | Wes McKinney
- Modin accelerates Pandas by automatically distributing the computation across all of the system’s available CPU cores
- Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
- Pandas DataFrame: A lightweight Intro | Daksh Deepak - Towards Data Science
- Joining DataFrames in Pandas | Manish Pathak - Data Camp
- SciPy library - one of the core packages that make up the SciPy stack. It provides many user-friendly and efficient numerical routines such as routines for numerical integration, interpolation, optimization, linear algebra and statistics.
- SymPy library - a Python library for symbolic mathematics aiming to become a full-featured computer algebra system (CAS)
- mpmath | Fredrik Johansson library for real and complex floating-point arithmetic with arbitrary precision
- mpmath | Fredrik Johansson library for real and complex floating-point arithmetic with arbitrary precision. can be used as a library, interactively via the Python interpreter, or from within the SymPy or Sage computer algebra systems which include mpmath as standard component. CoCalc lets you use mpmath directly in the browser. Cocalc or "Collaborative Calculation in the Cloud" enables programming online without the need to install any software.
- NetworkX a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Video: Connected: A Social Network Analysis Tutorial with NetworkX ...Social Network Analysis (SNA), the study of the relational structure between actors, is used throughout the social and natural sciences to discover insight from connected entities.
- Social Network Analysis (SNA)
- Twitter Network Analysis with NetworkX - PyCon 2015 | Sarah Guido, Celia La - GitHub
- NetworkX Visualization Powered By Bokeh | Björn Meier
- SageMath graph tools survey ...components
- Network Pattern
- Scikit-learn library for machine learning in Python built on NumPy, SciPy, and matplotlib. A toolkit implement a wide variety of algorithms for un/supervised machine learning tasks, including regressions, clustering, manifold learning, principal components, density estimation, and much more. It also provides many useful tools to help build “pipelines” for managing modeling tasks such as data processing /normalization, feature engineering, cross-validation, fitting, and prediction. The package scikit-learn is recommended to be installed using pip install scikit-learn but in your code imported using import sklearn.
Graphical User Interface (GUI)
- TkInter ... comes with Python already. Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to the Tk GUI toolkit, and is Python's de facto standard GUI. Tkinter is included with standard Linux, Microsoft Windows and Mac OS X installs of Python. The name Tkinter comes from Tk interface.
- Kivy ... open source Python library for rapid development of applications that make use of innovative user interfaces, such as multi-touch apps. Kivy bassed on OpenGL, draw in 2D, 3D, meshes, and shaders, on runs on Linux, Windows, OS X, Android, iOS, and Raspberry Pi. You can run the same code on all supported platforms. Kivy is 100% free to use, under an MIT license (starting from 1.7.2) and LGPL 3 for the previous versions.
- Kivy ... Qt is set of cross-platform C++ libraries that implement high-level APIs for accessing many aspects of modern desktop and mobile systems. Library implements the QT application development framework and has QTDesigner: drag and drop interface. These include location and positioning services, multimedia, NFC and Bluetooth connectivity, a Chromium based web browser, as well as traditional UI development. PyQt5 is a comprehensive set of Python bindings for Qt v5. It is implemented as more than 35 extension modules and enables Python to be used as an alternative application development language to C++ on all supported platforms including iOS and Android. PyQt5 is released under the GPL v3 license and under a commercial license that allows for the development of proprietary applications.
- wxPython a cross-platform GUI toolkit for the Python programming language. It allows Python programmers to create programs with a robust, highly functional graphical user interface, simply and easily. It is implemented as a set of Python extension modules that wrap the GUI components of the popular wxWidgets cross platform library, which is written in C++.Like Python and wxWidgets, wxPython is Open Source, which means that it is free for anyone to use and the source code is available for anyone to look at and modify. And anyone can contribute fixes or enhancements to the project.wxPython is a cross-platform toolkit. This means that the same program will run on multiple platforms without modification. Currently Supported platforms are Microsoft Windows, Mac OS X and macOS, and Linux or other unix-like systems with GTK2 or GTK3 libraries. In most cases the native widgets are used on each platform to provide a 100% native look and feel for the application.
- Pyside2 ...a Python binding of the cross-platform GUI toolkit Qt, currently developed by The Qt Company under the Qt for Python project on porting PySide to work with Qt 5 instead of Qt 4. It is one of the alternatives to the standard library package Tkinter. Like Qt, PySide2 is free software.
Python & Google Sheets
- gsheets - small wrapper around the Google Sheets API to provide more convenient access to Google Sheets from Python scripts.
- gsheets - self-containd script to dump all worksheets of a Google Spreadsheet to CSV or convert any subsheet to a pandas DataFrame
- gspread ...Google Sheets Python API wrapper
- example Jupyter notebook using gspread to fetch a sheet into a Pandas DataFrame
- df2gspread ...transfer data between Google Sheets and Pandas
- pygsheets ...Google Sheets Python API v4 (v4 port of gspread providing further extensions)
- gspread-pandas ...Interact with Google Sheet through Pandas DataFrames
- pgsheets ...manipulate Google Sheets Using Pandas DataFrames (independent bidirectional transfer library, using the legacy v3 API, Python 3 only)
Python & Excel
What is the best library out there for working with Excel through Python? You can just export to CSV if it's just a table of data that doesn't need any formatting. Pandas works great for this. You don't need anything else
- Excel - Data Analysis - Microsoft
- Python with Excel Visualization
- Building Interactive Python tools with Excel as a front-end
- Reading and writing Excel workbooks
PyXLL is an Excel Add-In that enables developers to extend Excel’s capabilities with Python code. For organizations that want to provide their end users with functionality within Excel, PyXLL makes Python a productive, flexible back-end for Excel worksheets. With PyXLL, your own Python code runs in Excel using any Python distribution you like (e.g. Anaconda, Enthought’s Canopy or any other CPython distribution from 2.3 to 3.7). Because PyXLL runs your own full Python distribution you have access to all 3rd party Python packages such as NumPy, Pandas and SciPy and can call them from Excel.
- openpyxl - a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files
- Working with Excel Spreadsheets | Al Sweigart - Automate the Boring Stuff
- XlsxWriter - A Python module for creating Excel XLSX files.
- Creating Advanced Excel Workbooks with Python | Practical Business Python
- PyMC3 - Probabilistic Programming in Python - Bayesian Inference. Fit your model using gradient-based Markov chain Monte Carlo (MCMC) algorithms like NUTS, using ADVI for fast approximate inference — including minibatch-ADVI for scaling to large datasets — or using Gaussian processes to build Bayesian nonparametric models
- Markov Model (Chain, Discrete Time, Continuous Time, Hidden)
- Markov Decision Process (MDP)
- StatsModels A module for fitting and estimating many different types of statistical models as well as performing hypothesis testing and exploratory data analysis. It features tools for fitting generalized linear models, survival analyses, and multi-variate statistics.
- OpenCV - Open Computer Vision - work with images and/or videos and wish to add a variety of classical and state-of-the-art vision algorithms to their toolbox.
- LibROSA - audio and voice processing which can extract various kinds of features from audio segments, such as the rhythm, beats and tempo.
- PyGame - making multimedia applications like games built on top of the excellent Simple DirectMedia Layer (SDL) library.
- DASK provides advanced parallelism for analytics, enabling performance at scale for the tools you love - it is developed in coordination with other community projects like NumPy, Pandas, and scikit-learn
- Joblib provide lightweight pipelining
- Tornado is a web framework and asynchronous networking library. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
- Numba JIT compiler that translates a subset of Python and NumPy code into fast machine code.
- xarray working with labelled multi-dimensional arrays simple, and efficient. Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures. Xarray was inspired by and borrows heavily from Pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with DASK for parallel computing.
- IPython Blocks a tool for practicing Python in the Jupyter giving learners a grid of colors to manipulate while practicing for loops, if statements, and other aspects of Python.
- Metaflow, Netflix and AWS open source Python library
Web Automation with Python - Data Gathering
Write a Python crawler to extract information from websites to identify patterns, both in terms of the URL patterns and XPath patterns. Once these patterns are figured out, these tools can automatically extract the needed information and organize data a usable structure.
Beautiful Soup - bs4
- Beautiful Soup Project for parsing HTML and XML documents. It creates parse trees
Allows you to import its functions and use them in-line. Therefore, you could even use it in your Jupyter notebooks.
- Scrapy webscraping .. open source and collaborative framework for extracting the data you need from websites
- Twisted an event-driven networking engine
- AIOps / MLOps - Machine Learning (ML) pipelines for SecDevOps
- Ansible and Python 3 | Red Hat
- Python Client API | Saltstack
Python is one of the most crucial orchestration and infrastructure automation components of AIOps / MLOps to reduce or almost eliminates disconnect between developers and system admins. AIOps / MLOps is centered on enabling AI pipelines for continuous integration and continuous deployment (CI/CD) with no downtime.
- ML impossible: Train 1 billion samples in 5 minutes on your laptop using Vaex and Scikit-Learn - Make your laptop feel like a supercomputer. | Jovan Veljanoski - Towards Data Science
- TPOT | Randal Olson - University of Pennsylvania - automatically creates and optimizes full machine learning pipelines using genetic programming. The Tree-Based pipeline Optimization Tool (TPOT) automates the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.
- ELI5 "Explain it like I'm 5" helps to...
- debug machine learning classifiers and explain their predictions.
- scikit-learn - Currently ELI5 allows to explain weights and predictions of scikit-learn linear classifiers and regressors, print decision trees as text or as SVG, show feature importances and explain predictions of decision trees and tree-based ensembles. ELI5 understands text processing utilities from scikit-learn and can highlight text data accordingly. Pipeline and FeatureUnion are supported. It also allows to debug scikit-learn pipelines which contain HashingVectorizer, by undoing hashing.
- xgboost - show feature importances and explain predictions of XGBClassifier, XGBRegressor and xgboost.Booster.
- LightGBM ...Microsoft's gradient boosting framework that uses tree based learning algorithms ... LightGBM - show feature importances and explain predictions of LGBMClassifier and LGBMRegressor. A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the Microsoft Distributed Machine Learning Toolkit (DMTK) project of Microsoft.
- CatBoost - show feature importances of CatBoostClassifier, CatBoostRegressor and catboost.CatBoost. A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU
- lightning - explain weights and predictions of lightning classifiers and regressors. Large-scale linear classification, regression and ranking in Python
- sklearn-crfsuite ELI5 allows to check weights of sklearn_crfsuite.CRF models. CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
- ELI5 also implements several algorithms for inspecting black-box models (see Inspecting Black-Box Estimators):
- TextExplainer allows to explain predictions of any text classifier using LIME algorithm. There are utilities for using LIME with non-text data and arbitrary black-box classifiers as well, but this feature is currently experimental.
- Permutation importance method can be used to compute feature importances for black box estimators.
- debug machine learning classifiers and explain their predictions.
Explanation and formatting are separated; you can get text-based explanation to display in console, HTML version embeddable in an IPython notebook or web dashboards, a Pandas DataFrame object if you want to process results further, or JSON version which allows to implement custom rendering and formatting on a client.
This library is essentially an extension of the scikit-learn library and provides some really useful and pretty looking visualisations for machine learning models. The visualiser objects, the core interface, are scikit-learn estimators and so if you are used to working with scikit-learn the workflow should be quite familiar.
This library contains a host of helper functions for machine learning. This covers things like stacking and voting classifiers, model evaluation, feature extraction and engineering and plotting.
- LIME (Local Interpretable Model-agnostic Explanations) explains the prediction of any classifier in an interpretable and faithful manner by learning a interpretable model locally around the prediction.
- SHAP and LIME Python Libraries: Part 1 – Great Explainers, with Pros and Cons to Both | Joshua Poduska - Domino
- Decrypting your Machine Learning model using LIME | Abhishek Sharma - Towards Data Science
- SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate additive feature attribution method based on expectations.
- Shapley Additive Explanations (SHAP)
- Demystifying Black-Box Models with SHAP Value Analysis | Peter Cooman
- openstack - open source software for creating private and public clouds
- Full Stack Python - Book | Matt Makai
- Deploying Flask Web Applications | Matt Makai GumRoad.com to purchase
- Flask a microframework for Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
- Flask boilerplate | Max Halford
- flaskSaaS | Max Halford starting point to build your SaaS in Flask & Python, with Stripe subscription billing
- flask-image-uploader | bboe
- Flask-Login for the user accounts
- Flask-SQLAlchemy interacting with the database
- Flask-WTF and WTForms for the form handling.
- Flask-Mail for sending mails.
- Flask-Bcrypt for generating secret user passwords.
- Flask-Admin for building an administration interface.
- Flask-Script for managing the app
- flask-stripe Stripe Checkout & user registration
- TensorFlow Serving
- News Aggregation
- Creating REST API for TensorFlow models | Vitaly Bezgachev - Medium - Becoming Human
Flask is considered more Pythonic than the Django web framework because in common situations the equivalent Flask web application is more explicit. Flask is also easy to get started with as a beginner because there is little boilerplate code for getting a simple app up and running. Flask | Full Stack Python
Flask & React
Flask & Docker
Flask, React, & Docker
- Full-stack tutorial: Flask + React + Docker | Riken Mehta - Medium
- Containerizing a Flask + React app with docker-compose | Devops Dummy - Medium
- Developing and Testing Microservices with Docker, Flask, and React | Michael Herman
- Microservices with Docker, Flask, and React | testdriven.io (course)
- Django - a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
- News Aggregation
Django is a widely-used Python web application framework with a "batteries-included" philosophy. The principle behind batteries-included is that the common functionality for building web applications should come with the framework instead of as separate libraries. Django | Full Stack Python
Other Web Frameworks supporting Python
- Wheezy Web
- tsfresh ...python package that automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks.
an open source scientific Python library that implements a novel yet intuitive approach for discovering patterns, anomalies, and other insights from any time series data. STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of time series data mining tasks such as pattern/motif (approximately repeated subsequences within a longer time series) discovery, anomaly/novelty (discord) discovery, shapelet discovery, semantic segmentation, streaming (on-line) data fast approximate matrix profiles, time series chains (temporally ordered set of subsequence patterns). STUMPY
Visualization with Python
- Python Visualization
- Python Data Visualization 2018: Why So Many Libraries? | James A. Bednar
- PyViz.org meta-initiative for helping users decide on the best open-source Python data visualization tools for their purposes, with links, overviews, comparisons, and examples.
- Pandas .plot() - basic plotting interface uses Matplotlib to render static PNGs in a Jupyter notebook or for exporting from Python, with a command that can be as simple as df.plot() for a DataFrame with two columns.
- Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. Rasterizing huge datasets quickly as fixed-size images.
- Panel -assembling objects from many different libraries into a layout or app, whether in a Jupyter notebook or in a standalone serveable dashboard
- Param -declaring user-relevant parameters, making it simple to work with widgets inside and outside of a notebook context
- Data Visualization |Full Stack Python
- Matplotlib generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc
- Visualization with Matplotlib | Jake VanderPlas - Python Data Science Handbook
- Driving your graphic via ipyWidgets
- Learn Plotly | SuperDataScience
- Driving your graphic via ipyWidgets
- It’s 2019 — Make Your Data Visualizations Interactive with Plotly | Jeff Hale - Towards Data Science Find the path to make awesome figures quickly with Express and Cufflinks
- Cufflinks | Jorge Santos - a library for easy interactive Pandas charting with Plotly. Cufflinks binds Plotly directly to Pandas DataFrames.
- The Next Level of Data Visualization in Python | Will Koehrsen - Towards Data Science
Cufflinks --> Plotly --> ployly.js --> D3.js
- plotly.js - built on top of d3.js and stack.gl, plotly.js is a high-level, declarative charting library. plotly.js ships with 20 chart types, including 3D charts, statistical graphs, and SVG maps.
Plotly Chart Studio
- Plotly Chart Studio - editor for creating d3.js and WebGL charts. Chart Studio is built on top of Plotly React, Plotly React Editor, the Plotly Image Server, Sheet.js, Handsontable and many other top-quality, open-source projects.
- HoloViews | J. Stevens, P. Rudiger, and J. Bednar Declarative objects for instantly visualizable data, building Bokeh plots from convenient high-level specifications
- pyviz/holoviews | GitHub
- Pygal | Florian Mounier for producing beautiful out-of-the-box charts with very few lines of code. Each chart type is packaged into a method (e.g. pygal.Histogram() makes a histogram, pygal.Box() makes a box plot), and there's a variety of colorful default styles. If you want more control, you can configure almost every element of a plot—including sizing, titles, labels, and rendering. You can output charts as SVGs and add them to a web page with an embed tag or by inserting the code directly into the HTML.
- scikit-image An image processing library featuring many common operations including convolutional mapping, filtering, edge detection, and image segmentation.
- Shapely - a spatial analysis library which extends Python to work as a fully-featured GIS environmental comparable to commercial software such as ArcGIS.
- Open Street Map a map of the world, created by people
- Geospatial Data Abstraction Library (GDAL) a translator library for raster and vector geospatial data formats
- Pyresample - re-projecting earth observing satellite data, capable of handling both swath data from polar-orbiting satellites and gridded data from geostationary satellites.
- SatPy | GitHub for earth-observing satellite data processing
- Fiona - handle vector data
- rasterio - handle raster data
- pyproj - transforming spatial reference systems - python interface to PROJ (cartographic projections and coordinate transformations library).
- Folium - creating maps
- GeoPandas - geospatial analysis; extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.
- GeoViews - visualizable geographic data that that can be mixed and matched with HoloViews objects