Christian Lawson-Perfect @christianp

0 posts0 participants0 posts today

**Nick Byrd, Ph.D.** @ByrdNick@nerdculture.de · Apr 16

Nick Byrd, Ph.D. @ByrdNick@nerdculture.de

Wow! #QualiService could be a great resource!

It wasn't obvious to me how to find the transcripts for these doctor-patient interaction data from 4 countries, but if such transcripts are accessible, that's GREAT!

https://www.qualiservice.org/en/qsearch.html?q=diagnosis

A Qualiservice search for "diagnosis" returned a list of qualitative datasets from video recordings of doctor-patient interactions in four countries (China, Germany, Netherlands, and Turkey).

#medicine #openData #cogSci

**Mario Angst** @mario_angst_sci@fediscience.org · Mar 26 *

Mar 26 *

Mario Angst @mario_angst_sci@fediscience.org

Want to analyze text from the EU public consultations? EU public consultations are a way in which the EU invites the broader public to publicly comment on upcoming legislation.

I just published a first version of a Python package {eu-consultations} to scrape and extract text from the EU website:
https://github.com/marioangst/eu_consultations

- download consultation data as displayed on the EU's frontend into a validated form
- download associated files (this is the hard part about analysing this data - lots of feedback is in .docx and .pdf files)
- extract text from the files using docling and attach to feedback

You get all data in validated form and possibly stored in huge (sorry for that) JSON files ;).

This package is part of an analysis project on feedback the EU has received via the public consultation process on digital policy we plan to present later this year, but I thought let's make some of the tools we use open source way earlier already.

eu-consultations: A Python package for scraping textual data from EU public consultations - marioangst/eu_consultations

GitHubGitHub - marioangst/eu_consultations: eu-consultations: A Python package for scraping textual data from EU public consultationseu-consultations: A Python package for scraping textual data from EU public consultations - marioangst/eu_consultations

#python #textanalysis #policyanalysis

**LMS Solution** @lms_solution@mastodon.social · Mar 15

Mar 15

LMS Solution @lms_solution@mastodon.social

AI-Powered Document Chat and Summarization Tool
Interact with documents, summarize, and get answers using AI.
https://zurl.co/wAly2
https://zurl.co/znKj5
#AI #DocumentChat #Summarization #ResearchTools #MultidocChat #LightPDF #AItools #DocumentInteraction #TextAnalysis #ResearchSupport #SmartSummarization #AcademicAI #InformationExtraction #ChatWithDocuments #DataInsights

**George Macgregor** @g3om4c@code4lib.social · Mar 7

Mar 7

George Macgregor @g3om4c@code4lib.social

Useful contribution to discussions in this area, for sure! The results highlight "whether an automated approach that would still require micromanaging and adjusting several variables by the human researcher would, in fact, be more efficient an approach compared to the same tasks performed manually by human labour"

Out of Context! Managing the Limitations of Context Windows in #ChatGPT-4o Text Analyses https://doi.org/10.46298/jdmdh.15090 #DigitalHumanities #TextAnalysis #LLM #ArtificialIntelligence #GLAMR

EpisciencesOut of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text AnalysesIn recent years, large language model (LLM) applications have surged in popularity, and academia has followed suit. Researchers frequently seek to automate text annotation - often a tedious task – and, to some extent, text analysis. Notably, popular LLMs such as ChatGPT have been studied as both research assistants and analysis tools, revealing several concerns regarding transparency and the nature of AI-generated content. This study assesses ChatGPT’s usability and reliability for text analysis – specifically keyword extraction and topic classification – within an “out-of-the-box” zero-shot or few-shot context, emphasizing how the size of the context window and varied text types influence the resulting analyses. Our findings indicate that text type and the order in which texts are presented both significantly affect ChatGPT’s analysis. At the same time, context-building tends to be less problematic when analyzing similar texts. However, lengthy texts and documents pose serious challenges: once the context window is exceeded, “hallucinated” results often emerge. While some of these issues stem from the core functioning of LLMs, some can be mitigated through transparent research planning.

**khushnuma** @khushnuma@mastodon.social · Nov 16, 2024

Nov 16, 2024

khushnuma @khushnuma@mastodon.social

NLP for Data Science: Insights from Text

#NLP #DataScience #TextAnalysis #MachineLearning #DeepLearning #NaturalLanguageProcessing #AI #DataMining #BigData #TextMining #SentimentAnalysis #TopicModeling #WordEmbeddings #DataVisualization

https://pando.life/article/256881

**khushnuma** @khushnuma@mastodon.social · Oct 19, 2024

**khushnuma** @khushnuma@mastodon.social · Oct 10, 2024

Oct 10, 2024

khushnuma @khushnuma@mastodon.social

Mastering these core NLP techniques is crucial for any data scientist dealing with text data. From tokenization to language modeling, each method serves a unique purpose in processing, analyzing, and extracting valuable insights from textual information.

#NLP #DataScience #Tokenization #LanguageModeling #TextAnalysis #TextMining #MachineLearning

read more: https://blogulr.com/khushnuma7861/topnlptechniqueseverydatascientistshouldknow-120682

**Nick Byrd, Ph.D.** @ByrdNick@nerdculture.de · Sep 25, 2024

Sep 25, 2024

Nick Byrd, Ph.D. @ByrdNick@nerdculture.de

Like we found in “Your Health vs. My Liberty” (https://doi.org/10.1016/j.cognition.2021.104649) Yael Rozenblum et al. found that compliance with #publicHealth guidance correlated with indicators of the perceived threat of a viral pandemic.

Also, relying on #misinformation correlated with reliance on simple (vs. complex) #reasoning.

The free paper: https://doi.org/10.1002/tea.21975

Measures of perceived threat (“motivation”) and compliance (“stance”).

How perceived threat (“motivation”) predicted compliance (“stance”).

Categorization of simple and complex reasoning (with some examples).

How reliance on misinformation correlated with complexity of reasoning and education.

#medicine #health #education

**Daniela Schneider** @SchnDa@fedihum.org · Sep 9, 2024

Sep 9, 2024

Daniela Schneider @SchnDa@fedihum.org

Have you ever wanted to use a #LLM as one step in a workflow?

We integrated #GPT into the open-source analysis platform #useGalaxy, where you can link GPT to several thousand other tools, add more attachments for analysis and make your research reproducible.

https://galaxyproject.org/news/2024-09-02-chat-gpt/

In our example, we uploaded an audio file and used #Whisper to convert it into text, cut out the moderation, and prompted chatGPT to translate it into German.

#DH #textanalysis #tools
@galaxyfreiburg

galaxyproject.orgUsing Large Language Models in complex workflowsUse ChatGPT in your analysis on the Galaxy Server to leverage the Large Language Model in your automated workflows

**Fabio Giglietto** @fabiogiglietto@aoir.social · Aug 21, 2024

Aug 21, 2024

Fabio Giglietto @fabiogiglietto@aoir.social

New working paper: "Evaluating Embedding Models for Clustering Italian Political News"

This study compares embedding models for unsupervised clustering of Italian political news shared on Facebook before the 2018 and 2022 elections, aiming to advance NLP methods for political text analysis in non-English languages.

Paper: https://osf.io/preprints/osf/2j9ed

Code & data: https://github.com/fabiogiglietto/Semantic-Clustering-Italian-News

Feedback welcome!

osf.ioOSF

#NLP #PoliticalScience #TextAnalysis

**Jason Robison** @jrrobison1@mastodon.social · Aug 18, 2024

Aug 18, 2024

Jason Robison @jrrobison1@mastodon.social

Pycpidr 0.3.0 introduces:
- Dependency-based Idea Density (DEPID)
- DEPID-R
- Custom sentence and token filters for DEPID

github.com/jrrobison1/pycpidr

#Python #Linguistics #psychometrics

**Jason Robison** @jrrobison1@mastodon.social · Aug 15, 2024

Aug 15, 2024

Jason Robison @jrrobison1@mastodon.social

Just launched: pycpidr
https://github.com/jrrobison1/pycpidr

Python library to determine the propositional idea density of an English text automatically.

Idea density is a measure of the amount of information conveyed relative to the number of words used. This metric has applications in various fields, including linguistics, cognitive science, and healthcare research.
#Python #Linguistics #psychometrics #NLP #TextAnalysis #OpenSource

GitHubGitHub - jrrobison1/pycpidr: Python library to determine the propositional idea density of an English text automatically.Python library to determine the propositional idea density of an English text automatically. - jrrobison1/pycpidr

**Elias Dabbas** @elias@seocommunity.social · Aug 15, 2024

Aug 15, 2024

Elias Dabbas @elias@seocommunity.social

Word co-occurrence matrix/heatmap

How to compute and visualize the correlation between terms that occur together in a list of documents*

*documents: keywords, page titles, product names/descriptions, social media posts, etc.

https://bit.ly/3Z4tiTx

#DataVisualization #textanalysis #DataScience

**Steven P. Sanderson II, MPH** @stevensanderson@mstdn.social · Jul 26, 2024

Jul 26, 2024

Steven P. Sanderson II, MPH @stevensanderson@mstdn.social

Hi everyone! I recently tackled a common data task using R: counting the occurrences of a specific phrase in a text file. It's a great way to practice text analysis and get familiar with R's powerful tools.

See the attached.

Happy coding!

#RStats #DataScience #TextAnalysis

**Harald Klinke** @HxxxKxxx@det.social · Jul 17, 2024 *

Jul 17, 2024 *

Harald Klinke @HxxxKxxx@det.social

The Digital Humanities Team at the University of Vienna and the Ottoman Nature in Travelogues (ONiT) project are hosting a #hackathon focused on analyzing texts, images, and multimodal sources.

Thursday, November 14, 9:00 CET to Friday, November 15, 15:00 CET
https://dh.univie.ac.at/hackathon/
#DigitalHumanities #ComputationalHumanities #TextAnalysis #ImageAnalysis

dh.univie.ac.atHackathonHackathon

Continued thread

**Marshall A. Taylor** @mtaylor_soc@sciences.social · Jul 8, 2024 *

Jul 8, 2024 *

Marshall A. Taylor @mtaylor_soc@sciences.social

It was also a methodologically fun paper, combining digitized archival text, Census & survey data, NLP, and panel models.

Email or dm me for a copy! #sociology #textanalysis #rstats

3/3

**Axel Pichler** @axelpichler@fedihum.org · Jun 4, 2024

Jun 4, 2024

Axel Pichler @axelpichler@fedihum.org

Attention Linguistics & Digital Humanities students!
Join @janispagel and me for the »Prompting, Evaluation, Interpretation: An Introduction to LLMs in Text Analysis« course at the upcoming Deep Learning for Language Analysis Summer School in Cologne: http://ml-school.uni-koeln.de!
Don't miss out – registration is open until June 16th!
#LLMs #TextAnalysis #NLP #AI #Linguistics #DigitalHumanities #CRETA

ml-school.uni-koeln.dehttp://ml-school.uni-koeln.de | CA3

**R User Group @Harvard** @RUGatHDSI@fosstodon.org · May 29, 2024

May 29, 2024

R User Group @Harvard @RUGatHDSI@fosstodon.org

Want to learn more about how to use regular expressions in R?

Come join us to learn how to use regular expressions to parse and clean text data on Thursday, June 6th, 5-6pm Eastern Time!

Find the Zoom registration details on our website:

https://rug-at-hdsi.org/upcoming_events/2024-05-06-regex-sarah-hirsch.html

R User Group at Harvard Data Science Initiative presents

the Magic of Regular Expressions

with Sarah Hirsch

Thursday, June 6th 5pm Eastern Time

Details and Registration are online at
https://rug-at-hdsi.org

#rstats #DataScience #regex

**alissonmasoares** @alissonmasoares@fosstodon.org · May 24, 2024

May 24, 2024

alissonmasoares @alissonmasoares@fosstodon.org

Bias estimation in word embeddings using a Bayesian approach instead of WEAT or MAC. A new paper in Computational Linguistics.

#ComputationalSocialSciences #textanalysis #NLP

@schizanon@mastodon.social · May 6, 2024

May 6, 2024

@schizanon@mastodon.social

How would you go about creating a filter that blocks posts about things that people hate?

I've thought I could build a text classifier, but it could be hard to train since I'd need to guess whether or not the author hates the thing they are posting about.

I wouldn't want it to become a filter for all current events news, but I suspect that's what it would become.

#fediverse #mastodon #machineLearning

Recent searches

Search options

Administered by:

Server stats:

#textanalysis