Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
Wikipedia and Kaggle Release Structured Dataset to Aid AI Development, Counter Scraping
#AI #AITraining #Wikipedia #Kaggle #AIData #MachineLearning #OpenData #Wikimedia #Datasets #BigData #DataScience #LLMs #NLP #Google #Alphabet
"...there is no #AI without #energy; at the same time, AI has the potential to transform the energy sector."
This "Energy and AI" #report from the International Energy Agency (#IEA) is based on new global and regional modelling and #datasets, as well as extensive consultation with governments and regulators, the #tech sector, the energy industry and international experts.
Unlock #research insights with the new #OpenAIREGraph #API!
Easily discover #publications, #datasets, & #software across #OpenScience infrastructures.
- Search with precision using linked #metadata
- Find #OpenAccess versions & related datasets
- Trace research back to funders & institutions
Start exploring today: https://graph.openaire.eu/docs/apis/graph-api/
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
#Reddit #AI #ContentModeration #datasets
'Researchers at Cornell Tech have released a dataset extracted from more than 300,000 public Reddit communities, and a report detailing how Reddit communities are changing their policies to address a surge in AI-generated content. '
https://news.cornell.edu/stories/2025/04/dataset-reveals-how-reddit-communities-are-adapting-ai
"Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data."
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
Massive, Unarchivable #Datasets of #Cancer, #Covid, #HIV and #Alzheimer's Research Could Be Lost Forever
Days before RFK announced 10,000 #HHS staffers would lose their jobs, a message appeared on #NIH research repository sites saying they were "under review." Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency.
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
https://archive.ph/Y8asq
Digital Archivists: Protecting Public Data from Erasure
https://spectrum.ieee.org/digital-archive
https://news.ycombinator.com/item?id=43558182
#ListenBrainz / #MetaBrainz I'm confused. Aren't sponsors the true customer? Why use this?
On one hand #Music: "Listen together", "Ethical forever"
On the other: #DATASETS
"Some of the world’s biggest platforms such as Google and Amazon, use our data"
"We ask commercial supporters to support us in order to help fund the creation and maintenance of these datasets."
"The following organizations make use of the data-sets published by MetaBrainz"
STAT: Gold-standard maternal mortality database in limbo as CDC staff placed on leave. “As part of the sweeping layoffs that rocked the Department of Health and Human Services on Tuesday, the entire staff that oversaw an annual survey to better understand infant and maternal health — and that was considered the gold standard in the field — was placed on administrative leave. The Pregnancy […]
#research #science #BigData #DataAnalysis #datasets
'Two hundred forty-six researchers in the fields of ecology and evolutionary biology — including two from Clemson University — worked in 174 teams to answer two different research questions based on the same unpublished data sets.
They came up with a strikingly variable range of answers, including some that were direct opposites of each other.'
Clemson News: Study: Researchers’ choices could result in different conclusions from the same data . “If you give hundreds of researchers the same data and the same hypotheses to test, they will reach the same conclusions, right? Wrong, according to a recent study published in the journal BMC Biology. Two hundred forty-six researchers in the fields of ecology and evolutionary biology — […]
New Map Of Landscape Beneath Antarctica Unveiled
--
https://phys.org/news/2025-03-landscape-beneath-antarctica-unveiled.html <-- shared technical article
--
https://doi.org/10.1038/s41597-025-04672-y <-- shared paper
--
#GIS #spatial #mapping #Bedmap3 #icebed #surface #thickness #gridded #datasets #Antarctica #raster #model #modeling #landscape #elevation #icesheet #survey #remotesensing #earthobservation #climatechange #warming #climate #melt #melting #seafloor #subglacial #geophysical #survey #topography #geology #bathymetry #topobathy #BritishAntarcticSurvey
@BritishAntarcticSurvey
arXiv: FediverseSharing: A Novel Dataset on Cross-Platform Interaction Dynamics between Threads and Mastodon Users. “In March 2024, Threads joined this federation by introducing its Fediverse Sharing service, which enables interactions such as posts, replies, and likes between Threads and Mastodon users as if on a unified platform. Building on this development, we introduce FediverseSharing, […]
From handling massive #DataSets to streamlining delivery, UC Berkeley #Library is ensuring that #ResearchData is well-managed, accessible, and compliant with licensing agreements through #Dataverse, so resources are discoverable and usable by the entire university community. #RDM #DataManagement https://youtu.be/XVBUna3wzgk?si=c_Ixa-sWVmzs3Ezm
Academic Torrents is one way to find academic #datasets with BitTorrent: https://academictorrents.com/ (I guess their indexing website is US-hosted, but it's not governmental so less likely to vanish this month.) #torrenting #science
This data may vanish under Trump, so we charted it
Some of most valuable #datasets in human history vanished from #US #government websites, felt like watching Library of Alexandria go up in smoke
Many have gone on record describing #Census Bureau’s #American Community Survey as wonder of modern world
Another loss? #HouseholdPulse survey, online survey that provided week-by-week data on income losses, economic struggles and precarious mental health
https://www.washingtonpost.com/business/2025/02/14/this-data-may-vanish-under-trump-so-we-charted-it/
https://archive.ph/mB512
"On Friday, numerous essential #datasets were #purged from federal agency websites, including #data from #CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate and Economic Justice Screening Tool (CEJST)—to name just a few. While we don’t know when or if this data will return, we want to assure you that they are still accessible on our platform." https://www.policymap.com/blog/purged-federal-agency-data-available #PolicyMap #PublicHealth #USPol #Project2025 #CivilRights