TextXD 2017

At the 2017 TextXD event, held at UC Berkeley’s Institute for Data Science (BIDS), we gathered about 60 researchers from several institutions representing expertise in text analysis. The common bond between all of these researchers coming from different domains is that they work with texts as a primary source of data. Throughout the event, we learned from one another while strengthening ties across disciplinary boundaries and began the development of collaborations that we hope will have a lasting impact on the community.

The TextXD event included different emphases each day:

preXD Text Analysis Workshop

Date: November 29, 2017, 1:00 to 6:00 PM
Location: Academic Innovation Studio (Dwinelle 117, Level D), UC Berkeley

This year, the TextXD Conference started with a preXD Workshop on November 29 from 1:00-6:00 PM, in order to give a quick introduction to text analysis in Python using Jupyter Notebooks. This session was specifically designed to bring people to the text-analysis-starting-line so that everyone would be ready for the make sessions over the next two days. No prior text analysis experience was needed to attend the preXD. Those without Python familiarity were invited to check out some introductory materials from UC Berkeley’s D-Lab here - click the “launch binder” black and red badge to run it all in your browser.

Text Analysis Across Domains Fall 2017 Conference

Dates: November 30 and December 1, 2017, 10:00 AM to 4:30 PM
Location: 190 Doe Library, UC Berkeley

This semester’s TextXD event was the biggest TextXD event to date. With so much going on in the world of natural language processing, TextXD opened to researchers beyond the UC Berkeley campus. The agenda consisted of short morning talks on new tools, methods, software, and data (see videos in the Agenda, listed below). Speakers came from our own campus as well as UC San Francisco, UC San Diego, UC Santa Barbara, Princeton, and Drexel. Afternoon “make” sessions were also introduced this year so that participants could roll up their sleeves and spend time working together to craft solutions to our shared problems or to investigate research questions of shared interest. The text data used in the “make” sessions included newspaper articles, twitter feeds, emails, congressional hearings, and journal article abstracts.

AGENDA

THURSDAY, NOVEMBER 30 at BIDS (190 Doe Library)
09:30-10:00 — Breakfast
10:00-12:30 — Talks
- 10:00-10:10 — Nick Adams (BIDS): Welcome
- 10:10-10:35 — John Mohr (UCSB): The Frontiers of Social Scientific Text Analysis
- 10:35-10:45 — Cody Hennesy (UCB Library): Text Analysis on 14 Million Digital Library Books
- 10:45-11:15 — Julia Silge (StackOverflow): Text Mining with Tidy Data Principles and Count-based Methods
- 11:15-11:30 — Pramit Choudhary (DataScience): Explainable NLP Algorithms: Understanding Word Relevance in Text Datasets
- 11:30-11:40 — Elena Glassman (BIDS): Wavelets for Text
- 11:40-12:00 — Jamie Murdoch(UCB EECS): Beyond Word Importance: Contextual Decomposition for Interpreting LSTMs
- 12:00-12:05 — Devin Cornell (UCSB): Word Embedding and Semantic Analysis of News Data
- 12:05-12:25 — Make Session Previews
- 12:25-12:30 — Wrap up and send to lunch
12:30-13:30 — Lunch
13:00-13:30 — Lunch Chat Panel — The Frontier of NLP (at Berkeley and Beyond)
13:30-17:00 — Make Session
- 1:30-1:40 — Elena Glassman (UCB): Welcome and Process Intro
- 1:40-3:30 — Make Sessions
- 3:30-3:45 — Tea break
- 3:45-4:40 — Keep making!
- 4:40- 5:00 — Review results
17:00-19:00 - Happy Hour — Tap Haus (2516 Durant Ave)

FRIDAY, DECEMBER 1 at BIDS (190 Doe Library)
09:30-10:00 — Breakfast
10:00-12:30 — Talks
- 10:00-10:05 — Alex Paxton (BIDS): Welcome Back!
- 10:05-10:20 — Claudia von Vacano (D-Lab): Scalable Detection of Online Hate Speech
- 10:20-10:50 — Jake Ryland Williams (Drexel): Minimal Semantic Units in Text Analysis
- 10:50-11:05 — Han Zhang (Princeton): Uncovering Authoritarian Rule: Identifying Collective Action with Social Media Data
- 11:05-11:35 — Rex Douglass (UCSD): Georeferencing of Events from Text
- 11:35-11:55 — Nick Adams (BIDS): TextThresher: Qualitative Text Analysis at a Quantitative Scale (originally scheduled: Aditi Muralidharan (Google): “WordSeer for Text Exploration”)
- 11:55-12:05 — Oksana Gologorskaya (UCSF): Text Analysis in Biomedical Applications at UCSF
- 12:05-12:20 — Miriam Petruck (ICSI): The FrameNet Database -- FrameNet: The Tip of the Iceberg
- 12:20-12:35 — Meredith Lee (West Big Data Innovation Hub / UC Berkeley): Collaborating with the Big Data Innovation Hubs
12:35-13:30 — Lunch
13:00-13:30 — Lunch Chat Panel — Humans In the Loop: The Role of Humans in Text Analysis
13:30-17:00 — Make Session
- 1:30-3:00 — Making
- 3:30-3:45 — Tea break
- 3:45-4:30 — Make more!
- 4:30-5:00 — Reports & wrap up - Conference Closing and Remarks from Participants

TextXD 2017 Website