All
ActionScript

Ajax

AngularJS

Apache

AppleScript

ASP.NET

Bash

C

C#

C++

Coffee

CoffeeScript

ColdFusion

Command

CSS

Delphi

Django

ES6

GLSL

Grunt

Gulp

HAML

Haskell

HTML

iOS

Jade

Java

JavaScript

jQuery

JSX

Less

LUA

MDX

MySQL

Objective

Other

Pascal

Perl

PHP

Plain text

PowerShell

Processing

Progress

Prolog

Pseudocode

Python

Rails

RegExr

Ruby

SASS

Scala

Scheme

SCSS

SmallBASIC

Smarty

SQL

Stylus

SVG

Swift

TypeScript

VHDL

X++

XHTML

XML

Xojo

XSLT
New snippet New playground
Sign up
Login

Sanitize.py

Python
by luis corona
9th January 2020

def shave_marks_latin(txt): """Remove all diacritic marks from Latin base characters""" norm_txt = unicodedata.normalize('NFD', txt) latin_base = False keepers = [] for c in norm_txt: if unicodedata.combining(c) and latin_base: continue # ignore diacritic on Latin base char keepers.append(c) # if it isn't combining char, it's a new base char if not unicodedata.combining(c): latin_base = c in string.ascii_letters shaved = ''.join(keepers) return unicodedata.normalize('NFC', shaved)

Function to remove combining marks from Latin characters. import statements are omitted as this is part of the sanitize.py module.

Sanitize.py

Be the first to comment