Google Trends : Examine and analyze data on internet search activity and trending news stories around the world. It contains images from complex scenes around the world, annotated using bounding boxes. Labelme : A large dataset of annotated images. ImageNet : The de-facto image dataset for new algorithms, organized according to the WordNet hierarchy, in which hundreds and thousands of images depict each node of the hierarchy.

LSUN : Scene understanding with many ancillary tasks room layout estimation, saliency prediction, etc. COIL : different objects imaged at every angle in a rotation.

Labelled Faces in the Wild : 13, labeled images of human faces, for use in developing applications that involve facial recognition. Stanford Dogs Dataset : Contains 20, images and different dog breed categories. Contains 67 Indoor categories, and images. Multidomain sentiment analysis dataset : A slightly older dataset that features product reviews from Amazon. IMDB reviews : An older, relatively small dataset for binary sentiment classification features 25, movie reviews.

Stanford Sentiment Treebank : Standard sentiment dataset with sentiment annotations. Sentiment : A popular dataset, which uses , tweets with emoticons pre-removed. HotspotQA Dataset : Question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems. Enron Dataset : Email data from the senior management of Enron, organized into folders.

Amazon Review s : Contains around 35 million reviews from Amazon spanning 18 years.

Data include product and user information, ratings, and plaintext review. Google Books Ngrams : A collection of words from Google books. Blogger Corpus : A collection of ,blog posts gathered from blogger. Each blog contains a minimum of occurrences of commonly used English words. Wikipedia Links data : The full text of Wikipedia. The dataset contains almost 1. You can search by word, phrase or part of a paragraph itself. Hansards text chunks of Canadian Parliament : 1. Jeopardy : Archive of more than , questions from the quiz show Jeopardy.

Rotten Tomatoes Reviews : Archive of more than , critic reviews fresh or rotten. Yelp Reviews : An open dataset released by Yelp, contains more than 5 million reviews.

As you know, Wikipedia is a great source of information. DBpedia aims at getting structured content from the valuable information that Wikipedia created. With DBpedia, you can semantically search and explore relationships and properties of Wikipedia resource.

This includes links to other related datasets as well. There are around 4. There are labels and abstracts for these entities in around languages. There are DBpedia has benefitted several enterprises, such as Apple via Siri , Google via Freebase and Google Knowledge Graph , and IBM via Watson , and particularly their respective prestigious projects associated with artificial intelligence. It is an open source community.

Why it matters is because it enables you to code, build pro bono projects after nonprofits and grab a job as a developer. In order to make this happen, the freeCodeCamp. They have turned it into open data. You will find a variety of things in this repository. You can find datasets, analysis of the same and even demos of projects based on the freeCodeCamp data. You can also find links to external projects involving the freeCodeCamp data. It can help you with a diversity of projects and tasks that you may have in mind. Whether it is web analytics, social media analytics, social network analysis, education analysis, data visualization, data-driven web development or bots, the data offered by this community can extremely useful and effective.

The Yelp dataset is basically a subset of nothing but our own businesses, reviews and user data for use in personal, educational and academic pursuits. There are 5,, reviews, , businesses, , pictures and 10 metropolitan areas included in Yelp Open Datasets. You can use them for different purposes.

Since they are available as JSON files, you can use them in order to teach students about databases. You can use them to learn NLP or for sample production data while you understand how to design mobile apps.

In this dataset, you will find each file composed of a single object type, one JSON-object per-line. The good thing is that there is a regular update when it comes to these datasets. Every month, the data is updated in order to make it more comprehensive, reliable and accurate. You can freely and easily access this data.

In order to do so, you can download this data in CSV format. You can also preview sample data prior to downloading it. With this, portal, you can explore IATI data.

  • You can search the information related to development activities, budgets etc. You can explore this information country-wise. If you click on the headers, you can also sort many of the tables that you see on the platform. You will also find many of the datasets in the platforms in machine-readable JSON format. Kaggle is great because it promotes the use of different dataset publication formats. However, the better part is that it strongly recommends that the dataset publishers share their data in an accessible, non-proprietary format.

    The platform supports open and accessible data formats. It is important not just for access but also for whatever you want to do with this data.

    Towards Data Science

    Therefore, Kaggle Dataset clearly defines the file formats which are recommended while sharing data. The unique thing about Kaggle datasets is that it is not just a data repository. Each dataset stands for a community that enables you to discuss data, find out public codes and techniques, and conceptualize your own projects in Kernels. You can find a variety of resources in order to start working on your open data project. Under this initiative, it is made possible for anyone to access any public information about the university in machine-readable formats.

    You can easily access and reuse it as per your needs. Open data about scientific artifacts and encoded as linked data is made available under this project. With the help of Linked Data, it is possible to share and use data, ontologies and various metadata standards. It is, in fact, envisaged that it will be the accepted standard for providing metadata, and the data itself on the Web.

