Islands vs. Streams as a learning model

How we learn in this digital age is broken, amazing, and weird at the same time. I see two major paradigms that are present in learning models: islands and streams.

Islands of learning are a traditional classroom approach. There’s a syllabus of reading, tasks to do, and lectures to listen to. It’s a very safe and guarded experience. Major online learning platforms such as Udemy or Coursera are bringing that model online. The other option is to learn through different streams of information. You read a chapter from a book, look at the YouTube video, do a short Skillshare course, and lurk on Instagram.

I’m trying to figure out how we can make these types of learnings more explicit to the learners. I’m also noticing that there is prestige attached to being a part of an island of learning. There’s just more status to say that you’re part of an expensive island instead of admitting that you’ve learned from many YouTube videos and blog posts.

There’s also a matter of getting good feedback loops in the process of learning. Let’s say you decide to learn about baking sourdough bread.

Islands way is to: 

  1. Read a book and try to follow instructions
  2. Go to a two-day workshop.
  3. Experiment with baking and talk to your family and a few close friends.
  4. Participate in a Facebook group for your workshop class

Streams way would be to:

  1. Find a 101 YouTube video from Tasty
  2. Read articles on the Perfect Loaf website
  3. Fail and discover “Beginner sourdough bakers” Facebook groups
  4. Post your pictures, get feedback from others, and with time get better
  5. Start posting on Instagram and get feedback from a global community of people

You only need to find the first few and you’ll organically find them by attracting people that are just a bit better than you. You’ll also start helping people that know a bit less than you.

6. Discover new sub-communities and repeat the process. Think of it as a journey and there’s no final destination.

The way I described streams is very community-driven. It requires a lot of vulnerability to consistently share failures instead of only your successes. Even with your best work you should approach it by asking how to improve it further.

I’m approaching learning from both perspectives and it’s frustrating in both cases. Islands make it hard to weave in resources from outside. Streams are often just these giants blobs of content that don’t make it easy to weave them into a coherent story and they’re hard to reference later.

Products that would help me on this learning journey

Figure out what’s the smallest and most streamable unit of content for each creative work. I’m thinking in terms of paragraphs and short video clips from longer videos. I’d like to have a way to easily assemble a learning trail of such content. Both for my reference and to be able to share it with others.

I’d like to relate these units of content to a larger community. Give a bit of context about the author and what’s the best audience for them.

Is there an inherent feedback loop that makes sharing such resources better with time? Can we develop assistive tools that will make it easier to suggest links to community-written FAQs or instructional videos?

Overall I still find that the process of collecting and curating learning is too high friction. There’s a lot of value in seeing the journey that one person took and we don’t make it easy for others to follow them. Our prevailing model is still mostly of top-down teaching and collaborative learning is still not a fundamental building block.

Matej Martinc explains Natural Language Processing

In Meaningful work interviews I talk to people about their area of work and expertise to better understand what they do and why it matters to them.

Matej Martinc is a Ph.D. researcher at “Jožef Stefan” Institute in the Department of Knowledge Technologies where he invents new approaches on how to work and analyze written text. He explained to me the basics of Natural Language Processing (NLP), why neural networks are amazing, and how one gets started with all of this. In the second half, he shared how he ended up in Computer Science with a Philosophy degree and why working for companies like Google is not something that interests him.

How do people introduce you?

They introduce me as a researcher at the IJS institute. I’m in the last year of my Ph.D. thesis research. I’m mostly working on Natural Language Processing (NLP). NLP is a big field and I’m currently exploring several different areas.

I initially started by automatically profiling text authors by their style of writing – we can detect their age, gender, and psychological properties. I also worked on automatic identification of text readability. We’ve also created a system to detect Alzheimer’s patients based on their writing.

Lately, I’ve been working on automatic keyword extraction and detecting political bias in word usage in media articles. I’m also contributing to research on semantic change – how word usage changes through time.

References to research that Matej is referencing throughout this interview. I encourage you to read them as they’re written in a very clear language.

Scalable and Interpretable Semantic Change Detection

[..] We propose a novel scalable method for word usage change detection that offers large gains in processing time and significant memory savings while offering the same interpretability and better performance than unscalable methods. We demonstrate the applicability of the proposed method by analyzing a large corpus of news articles about COVID-19

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. [..]

Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

We conduct automatic sentiment and viewpoint analysis of the newly created Slovenian news corpus containing articles related to the topic of LGBTIQ+ by employing the state-of the-art news sentiment classifier and a system for semantic change detection. The focus is on the differences in reporting between quality news media with long tradition and news media with financial and political connections to SDS, a Slovene right-wing political party. The results suggest that political affiliation of the media can affect the sentiment distribution of articles and the framing of specific LGBTIQ+ specific topics, such as same-sex marriage.

Can you start by explaining some background about NLP (Natural Language Processing) to start with?

As a first step, it’s good to consider how SVM (support vector machine) classifiers and decision tree techniques used for classification work. Very broadly speaking, they operate on a set of manually crafted features extracted from the dataset that you train your model on. Examples of that type of features would be: “number of words in a document” or a “bag of words model” where you put all the words into “a bag” and a classifier learns which words from this bag appear in different documents. If you have a dataset of documents, for which you know into which class they belong to (e.g., a class can be a gender of the author that wrote a specific document), you can train your model on this dataset and then use this model to classify new documents based on how similar these documents are to the ones in the dataset on which the model was trained. The limitation of this approach is that these statistical features do not really  take semantic relation between words into account, since they are based on simple frequency-based statistics.

About 10 years ago a different approach was invented using neural networks. What neural networks allow you to do is to work with unstructured datasets because you don’t need to define these features (i.e., classification rules) in advance. You train them by inputing sequences of words and the network learns on itself how often a given word appears closer to another word in a sequence. The information on each word is gathered  in a special layer of this neural network, called an embedding layer that is basically a vector representation that encodes how a specific word relates to other words. 

What’s interesting is that synonyms have a very similar vector representation. This allows you to extract relations between words. 

An example of that would be trying to answer: “Paris in relation to France” is the same as “Berlin in relation to (what?)”. To solve this question you can take the embedding of Paris, subtract the embedding of France and add embedding of Berlin and you’ll get an embedding as an answer – Germany. This was a big revolution in the field as it allows us to operationalize relations in the context of languages. The second revolution came when they invented transfer learning, a procedure employed for example in  the BERT neural network that was trained on BookCorpus with 800 million words and English Wikipedia with 2500 million words. 

In this procedure, the first thing you want to do is to train a language model. You want the model to predict the next word in a given sequence of words. You can also mask words in a given text and train the neural network to fill the gaps with the correct words. What implicitly happens in such training is that the neural network will learn about semantic relations between words. So if you’re doing this on a large corpus of texts (like billions of words in BERT) you get a model that you can use on a wide variety of general tasks. Because nobody had to label the data to do the training it means that it’s an unsupervised model.

Are you working with special pre-trained datasets?

I’m now mostly working with unsupervised methods similar to the BERT model. So what we do is to take that kind of model and do additional fine-tuning on a smaller training set  that makes it better suited for that specific research. This approach allowed us to do all of the research that I’m referencing here.

A different research area that doesn’t require additional training is to  employ clustering on the embeddings of these neural networks. You can take a corpus of text from the 1960s and another one from the 2000s. We can then compare how usage of specific embeddings (words) compare between these two collections of texts. That’s essentially how we can study how the semantic meaning of words changed in our culture.

Modern neural networks can also produce embedding for each usage of a word, meaning that words with more than one meaning have more than one embedding. This allows you to differentiate between Apple (software company) and apple (fruit). We used this approach when studying how different words connected to  COVID changed through time. We generated embeddings for each word appearance in the corpus of news about COVID and clustered these word occurrences into distinct word usages. Two interesting terms that we identified were diamond and strain. For strain, you can see the shift from using it in epidemiological terms (strain virus) to a more economic usage in later months (financial strain).

What we showed with our research is that you can detect changes even across short (monthly) time periods. There’s a limit to how accurately we can identify the difference. It’s often hard even for humans to decide how to label such data. We can usually get close to humane performance by using our unsupervised methods.

(both figures are from paper Scalable and Interpretable Semantic Change Detection)

Does this work for Non-English languages?

You can use the same technology with a non-English language and we’re successfully using it with Slovenian language. In the case of  viewpoint analysis of Slovenian news reporting, we’ve discovered a difference in how the word deep is used in  different context. Mostly because of the deep state that became a popular topic in certain publications.

For our LGBTIQ+ research, we can show that certain media avoids using the word marriage in the context of LGBTIQ+ reporting and replaces it with terms like  domestic partnership. They’re also not  discussing LGBTIQ+ relationship within the context of terms such as family. We can detect the political leaning of the media based on how they write about these topics.

We just started with this research on the Slovenian language so we expect that we’ll have much more to show later in the year.

(figure is from paper Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+)

So far you’ve talked about analysis and understanding of texts. What other research are you doing?

We’re working on models for generating texts as part of the Embeddia project. The output of this research also works with the Slovenian language.

We’re also investigating if we can transfer embeddings between languages. We have a special version of the BERT neural network that has been trained on 100+ different language Wikipedias. What we’ve found out is that you can take a corpus of texts in the English language, train the model on  it to, for example, detect the gender of the author, and then use that same model to predict the gender of the author of some Slovenian text. This approach is called a zero-shot transfer.

How approachable is all this research and knowledge? Do I need a Ph.D. to be able to understand and use your research?

It takes students of our graduate school about a year to become productive in this field. The biggest initial hurdle is that you need to learn how to work with neural networks.

Good thing is that we now have very approachable libraries in this field. I’m a big fan of PyTorch as it’s well integrated with the Python ecosystem. There’s also TensorFlow that’s more popular in the industry and less in research. I found it harder to use for the type of work we’re doing and harder to debug. With PyTorch it takes about a month or two for our students to understand the basics.

In our context, it’s not just about using the existing neural networks and methods. Understanding the science part of our field and how to contribute via independent paper writing and publishing it’s usually about 2 years.

How easy is it to use your research in ‘real-world’ applications?

We have some international media companies that are using our research in the area of automatic keyword extraction from text. We’re helping them with additional tweaking of our models.

Overall we try to publish everything that we do under open access licenses with code and datasets publicly available.

What we don’t do is maintain our work in terms of production code. It’s beyond the scope of research and we don’t have funding to do it. It’s also very time-consuming and it doesn’t help us with our future research. That’s also what I like about scientific research. We get to invent things and we don’t need to maintain and integrate them. We can shift our focus to the next research question.

So in practice, all of our research is available to you but you’ll need to do the engineering work to integrate it with your product.

Let’s shift a bit to your story and how you got into this research. How did you get here?

I first graduated in philosophy and sociology in 2011, at the time when Slovenia was still recovering from the financial crisis. While I considered Ph.D. in philosophy I decided that there are not many jobs for philosophers. That’s why I’ve enrolled in a Computer Science degree that offered better job prospects.

During my Computer Science studies, I was also working in different IT startups. I quickly realized that you don’t have a lot of freedom in such an environment. Software engineering was too constrained for me in terms of what kind of work I could do.

After I graduated I took the opportunity to do Erasmus Exchange and I went to University in Spain. In that academic environment, I found the opposite approach. I received a dataset, a very loose description of a problem, and complete freedom to decide on how I’m going to approach and solve the problem.

When I returned to Slovenia I decided to apply to a few different laboratories inside IJS to see if I could continue with academic research. I’ve got a few offers and accepted the offer from the laboratory where I’m working today. 

I also decided to focus on NLP and language technologies as I’m still interested in doing philosophical and sociological research. Currently, I have the freedom to explore these topics in my research field without too many constraints. I’m also really enjoying all the conferences and travel that comes with it. Due to the fast-changing nature of my field, all the cutting-edge research is presented at conferences, and publishing in journals is just too slow. It takes over a year to publish a paper but there’s groundbreaking research almost monthly.

How do you see research done at FAANG (Facebook, Amazon, Apple, Netflix, Google) companies? We know that they’re investing a large amount of money into this field and have large research teams.

They’re doing a lot of good research. At the same time, they’re also often relying more on having access to a large number of hardware resources that we don’t. This can be both a blessing and a curse. At the moment I don’t see their research being that much better from the findings from universities. Universities are also more incentivized to develop new optimization techniques as they can’t use brute hardware force for their research.

Are you considering working for a FAANG company after your Ph.D.?

Not really. I already have a lot of freedom in my research and I can get funding to explore the areas that interest me. If I would work inside a FAANG company I would need to start at the bottom of the hierarchy and also be limited by their research agenda.

I also really like living in Slovenia and I don’t want to relocate to another country. At the same time, I’m excited about potential study/researchexchanges as I enjoy collaborating with researchers at foreign institutions.

What are some good resources to follow in your field?

You can follow the current state of the art at:

Papers describing paradigm shifts in the field of NLP:

Unsupervised language model pretraining and transfer learning: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

What I learned from talking with Matej

  • Recognizing what kind of work makes you happy allows you to optimize your job or clients so that you do such work.
  • Natural Language Processing is a very approachable technology and not something that only big companies can use.
  • There are many opportunities to bring research findings into the industry. It does require expertise and connections to both fields.
  • These technologies now also work for the Slovenian language.

Initial thinking on Roam to WordPress Plugin

What I’m trying to do with this plugin is to create a digital garden that I can connect from Roam Research to WordPress.

Most of the existing solutions are using static web site generators. I’d like to avoid that because I want to have only one CMS on my site and to also maintain only one theme. There are also additional potential power features that would allow me toI can connect exported Roam pages to WordPress Taxonomies.

My current thinking is as follows:

When looking at the overall architecture design I realised that:

  • Roam’s JSON API is something that I don’t understand, it’s undocumented and hard to develop. So I’ll pass it for now.
  • Exported MD files are nice but they’re messy to automate.
  • Uploading one exported JSON file allows me keep all application logic in WordPress plugin and removes the need for any additional tools.

I’ve also found some nice prior art. There’s WP Roam Blocks that does something similar within Gutenberg.

I also appreciate blogging from David Bieber where he describes how he’s currently using Python to blog with Roam:

Next steps

  1. REST API to allow uploading of Roam’s JSON export
  2. Initial React backend upload dialog
  3. Basic jsonq parsing of uploaded JSON file

Aleš Vaupotič on conversations in IT

In Meaningful work interviews I talk to people about their area of work and expertise to better understand what they do and why it matters to them.

Aleš Vaupotič has a wide-ranging 35+ years career in building software and hardware solutions. He’s one of the most experienced e-commerce experts in the region. Our conversation ranged from how to grow as a developer, what it’s like to be part of the sales process, and where he currently sees technological opportunities.

We had a very wide-reaching conversation and what follows is just a partial summary of things we discussed.

How do people introduce you?

I’m most known as an Application Architect. I’m most often in a role where I lead the project from the initial meeting up until the first successful deployment. By that time I’m usually embedded in a larger team and I can usually call my client a friend. 

I’m most proud that I can identify the needs of my clients and have the necessary breadth of experience to lead the implementation. It took me a lot of time to develop this skill.

I explain my job as mostly finding the opportunities in the business process where IT can help. What can we do to make everyday work easier? It’s also a multi-layered system. We need to think about our client, our client’s client, and also possible end-users. We try to figure out how we can arrange processes so that everyone benefits. I think this type of thinking and planning is what makes me an Architect.

What were some of your early projects and what have you learned from them?

In my teens, I designed a mechatronic system that measured how fast we were skiing between two points at my local hill. That’s where I first learned how to define what the project needs, the materials required and how we’re going to implement it. Of course, this being in the 80’s it required a lot of ingenuity and borrowing of hardware parts from different household electronics.

I later worked with Globtour where we developed the first regional systems for booking, tourism transfers, and billing. There I first learned how to implement business needs and regulatory requirements into such a large-scale project.

Luckily, I had a colleague who previously worked as a programmer for IBM and was trained by them in the USA. He taught me the basics of project management. For them, an IT project started with planning for how you’re going to do backups, maintenance, and ease of use. With this in mind, we developed a software framework that was serving us well for many years. We made sure that there was a robust login system, defined backup system, permission systems, and centralized logging to name a few important aspects. Even though it was written in Pascal it was not much different from today’s modern frameworks like Laravel. It allowed us to be at about 1/3 of the budget and time of competing companies at the time.

Why did you recently switch from a small consultancy to working inside a larger agency?

While I always enjoyed working solo or in small teams, I’ve recently noticed that I’ve hit a limit of what I can accomplish alone. It’s just not possible for me to get larger projects because I don’t have the necessary certificates, references, or required team size. 

While I can offer a lot of experience and specialized knowledge, that’s not enough if you want to become a larger solutions vendor. To solve this I’ve decided to join a large digital company that can fulfill all these checkboxes and provides me with new challenges to work on.

You mentioned that you’ve participated in a lot of sales meetings. How do you see your role in the sales process and what makes for a good sales pitch in IT?

I found that being good at doing sales is a very important skill to master. I am not enjoying sales, but it’s a skill you can learn by reading a book and practicing. In this regard, I see it as just like learning JavaScript or any other technical skill.

A large part of sales in IT is also being able to explain the solution in a very plain language. I find it extremely valuable to invest time in these initial stages as we explain to the client what we’re going to do. If the client understands the benefit of high-quality localization and accessibility they are more than happy to both pay for it and accept a longer development timeline. What I also often see is that people try to rush this part of the process and don’t take time to learn how their client works and what their real needs are.

Creating a high-quality business overview document creates value for everyone. It helps different people on the client’s side understand what’s going on and it also ensures that there is a good understanding of the team that will execute a plan. It’s still a very imprecise problem as we still don’t know how to properly define software projects ahead of time. 

I noticed that having the capability to quickly create high-fidelity prototypes makes a big difference in how clients understand proposed project functionality.

How are you improving the quality of your communication with your team?

I’m always questioning my assumptions and how I relay information. Especially if the results are not what I expected. There’s always an opportunity to improve as a communicator. It’s always a new challenge when you need to delegate work that you used to do yourself. Many things that were obvious to you now need to be explicitly stated.

One of the things that I’m doing to improve in this area is that I’m blogging more and creating YouTube videos. Today I realize that I won’t improve if I don’t go through the process of creating, publishing, and then learning from feedback. Despite my understanding of all this it’s still scary to get feedback.

What does a good code look like?

It’s a code that it’s simple and straightforward on its own. Something that I know I’ll still understand months later without having to think about what it does. I’m not a fan of modern Javascript that you can write in a very terse way but doesn’t make it easy to reason about it. I’d rather write a for-loop than a map function. It feels much more natural to me. My personal mission at the moment is that each project that I work on has less unnecessary code.

These days I’m also studying code flows and how the project is structured. It doesn’t have to be a language that I’m working on. I’m just trying to understand and learn from these conceptual ideas. GitHub is a great source to find such projects.

What would be your advice to more junior colleagues in the industry that are battling with all the technology changes, fear of missing out, and are starting to talk about burnout?

What I see happening with some people is that they fall into the trap of everybody can be a developer. Sure, for some time but after that, you become tired if you keep doing the same things all the time. You need to find ways to grow and to keep challenging yourself. If they can’t do it in their workplace setting they could try to find that enthusiasm in life in their personal lives.

Writing code and building technological solutions were always something that excited me. I’m also very proud of my attention to detail and that I always see opportunities for my growth as a developer. That’s why I think I don’t feel many of these challenges.

You’re also supporting the development of young people through First Lego League Adria. Can you reflect on some of the things that you learned in the process?

I’m fascinated by how creative and capable kids are, aged between 8 and 16 years. They’re also always very successful when competing on the international level and have very solid language and presentation skills.

I’m happy to see that the best local participants are getting great opportunities for further studies and personal development. I hear that US Universities and Colleges are actively trying to recruit them with good scholarships. At the same time, there’s a lack of trust in such opportunities in our environment so they mostly go unused. After working with such teams for the last 10 years I see how much potential they have and what kind of big impact they’ll be able to have in their professional work.

What technologies excite you at the moment?

Svelte makes sense for me for front-end development. It just feels natural to write. I also really like the community around it. It’s really supportive, full of great ideas, and open to discussion. I’m also contributing to the Routify project that enables routing through file structures.

Incremental Static Regeneration in modern JavaScript frameworks is definitely something I can see a lot of good use cases for.

Tailwind makes it really easy to write CSS styles in a very natural way.

I’m looking at serverless and edge computing as I feel that this is finally the true cloud that allows us to bring websites closer to end customers.

I’m also excited about WASM and that we can push computation to clients’ devices. When we connect this potential with serverless it greatly simplifies a lot of the needed backend infrastructure.

What I learned from talking with Aleš

Communication and written expression is the most fundamental part of successful IT projects. If people can’t understand each other the project will fail.

Keep tinkering with technologies and different challenges. Fundamentals are always the same and I’ll be able to build on previous knowledge.

Don’t worry too much about businesses. They come and go. People and relationships around them last much longer.

Google Sheets in Python with gspread

As I build more back-office web interfaces I notice that users feel most comfortable in an Excel-like interface. That’s why it’s now so common to find data being edited and exchanged Google Sheets.

This got me wondering – how do I access, manipulate and write to Sheets from Python. I like the answer that I found – a library called gspread.

Basic usage

A high-level overview of how you use it:

  1. Create a service account in Google Developer Console
  2. Share your Google Sheet to that special email address
  3. You can now access it from python with gspread and related libraries

Making it even better with Pandas

While it’s great to have low-level access in Python it’s much more convenient if I can manipulate the data inside Pandas DataFrame. That way I don’t have to think about data structures and how to correctly represent data in each cell or row.

To do this I found two libraries:

They’re both similar and you essentially just import a snippet:

For a larger project, I used gspread-pandas (just because I found it first) and it gives you quite a lot of control over how and when you update the data.

Sheets and Data Validation

Sheets allow you to define data validation checks on specific cells. You can validate against a predefined list of items or reference a range of cells. This allowed me to build an elaborate export of data from an API and provide users a way to quickly review the data and also possibly use Sheets to update the data.

To manipulate Sheets data validation from Python you can use gspread-formatting library.

Video and Slides of Talk on this topic

If you’d like to see slides from my talk on this Subject at the Python Ljubljana meetup group I’ve embedded them below.

Was this useful for you?

If this is useful to you, please leave a comment or send me an email. I’ll be happy to write a more detailed tutorials and support you.