Questions tagged [large-datasets]
For questions related to large dataset requests
23 questions
2
votes
1
answer
25
views
How to access the data collected during the "Grand débat" in France?
After the success of the "yellow vests" mouvement in France, a nation-wide questionnaire was deployed by the government, both online and in every town hall. It was called the "Grand ...
1
vote
0
answers
12
views
Call Center Transcripts and reference knowledge documents
I am looking for:
call center transcripts datasets
The accompanying knowledge files referred to by the agents for answering the customer questions.
I need this data to work to build a Performance ...
0
votes
0
answers
24
views
Anyone know a good service or API to populate a vehicle database? (EU/UK + USA)
I’m building a car spotting app and need to populate a database with vehicle makes, models, trims, and years. I’ve found the NHTSA API for US cars, which is great and free. But I’m struggling to find ...
2
votes
2
answers
41
views
Academic Hosting for 10-Year South Atlantic Weather Dataset
I have compiled a 10-year South Atlantic weather dataset, totaling approximately 5 TB, and plan to write a dataset article based on it. I’m looking to host the dataset publicly either for free or at a ...
1
vote
1
answer
72
views
Dataset of early termination of insurance contracts
I need a dataset of early terminations of insurance contracts in order to predict whether the contract will be terminated or not
It should preferably contain the following features or some of them, ...
2
votes
1
answer
76
views
How assign DOI to an existing dataset
We have collected a muti-pettabyte dataset during our experiment which is stored on external, but accessible from outside of our lab dataset. How I can register a DOI for my dataset and how I can ...
5
votes
1
answer
2k
views
REAL call center conversation recordings/transcripts
I am looking for a dataset containing audio recordings and/or transcripts of real customer service calls. I require recordings/transcripts of both the customer service agent handling the call and the ...
0
votes
1
answer
80
views
Sequoia 2000 benchmark dataset
I'm recreating the DBSCAN cluster algorithm which uses the Sequoia 2000 benchmark dataset. There's an old question here that has a link for the dataset, but I've been doing more research and I can't ...
1
vote
0
answers
37
views
Need real life bivariate datasets with outliers that can be modelled by sum of exponential models
I need real life bivariate datasets, with authentic sources, with with the following characteristics:
Data can be modelled by the non-linear relationships of the form
yi = α0 + α1* e^(β* ti) + εi,
...
3
votes
1
answer
146
views
List of US Universities And Colleges
I am trying to find a list of all US universities and colleges for my Ph.D. project. I have downloaded the scorecard data; however, it is missing major universities, like Brown, the Univeristy of ...
3
votes
0
answers
54
views
Dataset for Swimming with Heartbeat and Motion Data
I am currently working on my final project CS degree that involves analyzing swimming data, specifically focusing on both heartbeat and motion data. I am looking for a dataset that includes these two ...
2
votes
0
answers
155
views
Public database with celebrity names?
Essentially I have a list of customer names (first, last), and we want to know whether or not any of these have celebrity status, so looking to cross-reference my list with an external database.
Again,...
1
vote
0
answers
521
views
Database of Facebook profile photos
Does there already exists a data set containing, for at least 100 individual people, the photos in their facebook profiles (or their profiles at the time of collection)? I want to use these some ...
0
votes
0
answers
15
views
Looking for a Cellular network environment or Heteregenous wireless networks environment Datasets
I am looking for a Cellular network environment or Heterogeneous wireless networks environment Dataset. A dataset that includes network parameters as features such as delay, jitter, throughput, cost ...
1
vote
0
answers
19
views
Prediction of dress size by body measurment
Hope you all doing well. I am making an E-commerce project where there is need of recommendation system that would recommend dress size as S(small), M(medium), L(Large), XL(Extra large) ,2XL on the ...
1
vote
1
answer
434
views
Where can I find a large unprocessed dataset?
I am looking for a large dataset (greater than 100MB), preferably CSV, which is raw and unprocessed, with missing values. It is for a data preprocessing assignment as part of my college course on Big ...
0
votes
0
answers
14
views
What are the available labeled twitter datasets by topic
I want to know about recently available datasets for twitter topics where each tweet has its topic.
2
votes
0
answers
1k
views
Dataset of Indian Address
I am looking for a dataset of Indian Address.
Possibly in the following format:
Address Line 1,
Address Line 2,
Town/City,
State,
Postal code.
I have tried to find it but there doesn't seem to be one ...
0
votes
0
answers
42
views
Wind Speed and Direction Datasets
I am trying to find historical (e.g., one year) wind speed and wind direction datasets for different suburbs in Sydney/Greater Sydney in CSV format. I tried different platforms for it but could not ...
-1
votes
1
answer
920
views
how I can read the content of the dataset in pkl file and return 67 sparse matrix of type '<class 'numpy.int64'>'in python?
I have a dataset in pkl file, and I write the following code to unpkl the file and read the data
import pickle
f=open("data_cdg.pkl","rb")
a = pickle.load(f)
f.close()
print(a)
...
0
votes
1
answer
53
views
Data set with time-series data for visualization exercise
I am trying to find a data set for data visualization purposes. This can be a data set which has only three months of data in it, or a subset of a larger data set. What reputable data sets can you ...
1
vote
0
answers
39
views
GIS Data on US College Dormitories
I am trying to find spatial data on all (or as many as possible) US college dormitories. Specifically, I want to find the location as a polygon or centroid. I have tried a few methods to acquire this ...
4
votes
2
answers
251
views
Open Source huge dataset for Language Modelling
Looking for text data including books, GitHub repositories, webpages, chat logs, and medical, physics, math, computer science, and philosophy papers for Language modeling.