2010-01-03 62 views
6

由于我已经转换到Emacs教会,我一直在尝试从内部做所有事情,我想知道如何快速高效地处理文本处理。用elisp处理文本

举一个例子,让我们看一下我在org-mode上编辑几分钟前的这个列表。

 
** Diego: b QI 
** bruno-gil: b QI 
** Koma: jo 
** um: rsrs pr0n 
** FelipeAugusto: esp 
** GustavoPupo: pinto tr etc 
** GP: lit gtk 
** Alan: jo mil pc 
** Jost: b hq jo 1997 
** Herbert: b rsrs pr0n 
** Andre: maia mil pseudo 
** Rodrigo: c 
** caue: b rsrs 7arte pseudo 
** kenny: cri gif 
** daniel: gtk mu pr0n rsrs b 
** tony: an 1997 esp 
** Vitor: b jo mimimi 
** raphael: b rpg 7arte 
** Luca: b lit gnu pc prog mmu 7arte 1997 
** LZZ: an qt 
** William: b an jo pc 1997 
** Epic: gtk 
** Aldo: b pseudo pol mil fur 
** GustavoKyon: an gtk 
** CarlosIsaksen : an hq jo 7arte gtk 1997 
** Peter: pseudo pol mil est 1997 gtk lit lang 
** leandro: b jo cb 
** frederico: 7arte lit gtk 
** rol: b an pseudo mimimi 7arte 
** mathias: jo lit 
** henrique: 1997 h gtk qt 
** eumané: an qt 
** walrus: cri de 
** FilipePinheiro: lit pseudo 
** Igor: pseudo b 
** Erick: b jo rpg q 1997 gtk 
** Gabriel: pr0n rsrs qt 
** george: clo mimimi 
** anão: hq jo 1997 rsrs clô b 
** jeff: 7arte gtk 
** davidatenas: an 7arte 1997 esp qt 
** HHahaah: b 
** Eduardo: b 

这是一个与标签相关联的名称列表,我想获得与名称关联的标签列表。

在bash中,我首先回应单引号粘贴整个事情,然后将其管道awk,循环遍历每一行,并将其各部分添加到正确的临时变量,然后搞砸它,直到它是我想要的。

 
echo '** Diego: b QI 
** bruno-gil: b QI 
** Koma: jo 
** um: rsrs pr0n 
** FelipeAugusto: esp 
** GustavoPupo: pinto, tr etc 
** GP: lit gtk 
** Alan: jo mil pc 
** Jost: b hq jo 1997 
** Herbert: b rsrs pr0n 
** Andre: maia mil pseudo 
** Rodrigo: c 
** caue: b rsrs 7arte pseudo 
** kenny: cri gif 
** daniel: gtk mu pr0n rsrs b 
** tony: an 1997 esp 
** Vitor: b jo mimimi 
** raphael: b rpg 7arte 
** Luca: b lit gnu pc prog mmu 7arte 1997 
** LZZ: an qt 
** William: b an jo pc 1997 
** Epic: gtk 
** Aldo: b pseudo pol mil fur 
** GustavoKyon: an gtk 
** CarlosIsaksen : an hq jo 7arte gtk 1997 
** Peter: pseudo pol mil est 1997 gtk lit lang 
** leandro: b jo cb 
** frederico: 7arte lit gtk 
** rol: b an pseudo mimimi 7arte 
** mathias: jo lit 
** henrique: 1997 h gtk qt 
** eumané: an qt 
** walrus: cri de 
** FilipePinheiro: lit pseudo 
** Igor: pseudo b 
** Erick: b jo rpg q 1997 gtk 
** Gabriel: pr0n rsrs qt 
** george: clo mimimi 
** anão: hq jo 1997 rsrs clô b 
** jeff: 7arte gtk 
** davidatenas: an 7arte 1997 esp qt 
** HHahaah: b 
** Eduardo: b 
' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort 

...和TA-DA!预期的输出时间不到2分钟,以直观和渐进的方式完成。你能告诉我如何在elisp中做这样的事情,最好是在emacs缓冲区中,优雅和简单?

谢谢!

回答

0

这是我的第二次尝试。我写了一个小宏和一些函数来处理这些数据。

 
(defun better-numberp (s) 
    (string-match "^ *[0-9.,]* *$" s)) 

(defmacro awk-like (&rest args) 
    (let ((arg (car (last args))) 
     (calls (mapcar #'(lambda (l) 
          (cond 
          ((numberp (first l)) (cons `(lambda (f) (equal %r ,(first l))) (rest l))) 
          ((stringp (first l)) (cons `(lambda (f) (string-match ,(first l) %)) (rest l))) 
          (t l))) 
         (butlast args)))) 
    `(mapcar #'(lambda (%%) 
       (let ((%r 0)) 
        (mapcar 
        #'(lambda (l) 
         (setq %r (1+ %r)) 
         (let ((% l)) 
          (dolist (tipo ',calls) 
          (progn 
           (setq % (cond 
             ((funcall (first tipo) %) (eval (cadr tipo))) (t %))) 
           (set (intern (format "%%%d" %r)) %))) %)) %%))) 
      (mapcar #'(lambda (y) (split-string y " " t)) 
        (split-string ,arg "\n" t))))) 

(defun hash-to-list (hashtable) 
    "Return a list that represent the hashtable." 
    (let (mylist) 
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable) 
    mylist 
    ) 
) 

(defun append-hash (key value hashtable) 
    (let ((current (gethash key hashtable))) 
    (puthash key 
      (cond 
       ((null current) (list value)) 
       ((listp current) (cons value current)) 
       (t current)) 
      hashtable))) 

 
(let ((foohash (make-hash-table :test 'equal))) 
    (awk-like 
    (2 (replace-regexp-in-string ":" "" %)) 
    ((lambda (f) (> %r 2)) (append-hash % %2 foohash)) 
    "** Diego: b QI 
** bruno-gil: b QI 
** Koma: jo 
** um: rsrs pr0n 
** FelipeAugusto: esp 
** GustavoPupo: pinto tr etc 
** GP: lit gtk 
** Alan: jo mil pc 
** Jost: b hq jo 1997 
** Herbert: b rsrs pr0n 
** Andre: maia mil pseudo 
** Rodrigo: c 
** caue: b rsrs 7arte pseudo 
** kenny: cri gif 
** daniel: gtk mu pr0n rsrs b 
** tony: an 1997 esp 
** Vitor: b jo mimimi 
** raphael: b rpg 7arte 
** Luca: b lit gnu pc prog mmu 7arte 1997 
** LZZ: an qt 
** William: b an jo pc 1997 
** Epic: gtk 
** Aldo: b pseudo pol mil fur 
** GustavoKyon: an gtk 
** CarlosIsaksen: an hq jo 7arte gtk 1997 
** Peter: pseudo pol mil est 1997 gtk lit lang 
** leandro: b jo cb 
** frederico: 7arte lit gtk 
** rol: b an pseudo mimimi 7arte 
** mathias: jo lit 
** henrique: 1997 h gtk qt 
** eumané: an qt 
** walrus: cri de 
** FilipePinheiro: lit pseudo 
** Igor: pseudo b 
** Erick: b jo rpg q 1997 gtk 
** Gabriel: pr0n rsrs qt 
** george: clo mimimi 
** anão: hq jo 1997 rsrs clô b 
** jeff: 7arte gtk 
** davidatenas: an 7arte 1997 esp qt 
** HHahaah: b 
** Eduardo: b 
") 
    (hash-to-list foohash)) 
+0

这可能只是一个挑剔,但我发现它有助于学习一门新的语言来学习那种对该语言来说是惯用的代码缩进 - 这种语言所期望的缩进和括号化可以帮助我保持流畅的那种语言。而且,由于某人使用lisp很多,看起来这种方式很刺激,就像在BASIC或FORTRAN以外的语言中查看全部大写的代码一样。 – 2010-01-18 06:55:06

+0

你的意思是宏还是哈希列表函数?如果是宏,你能告诉我如何正确缩进它吗?该功能只是从Xah Lee的页面复制而来 – konr 2010-01-22 08:38:42

5

有一个函数shell-command-on-region几乎可以完成它所说的。您可以突出显示一个区域,执行M- |,输入shell命令的名称,并将数据传送到该命令。给它一个参数,并用该命令的结果替换该区域。

对于一个简单的例子,突出显示一个区域,输入'C-u 0 M- | wc'(control-u,零,meta-pipe,然后是'wc'),该区域将被替换为该区域的字符,单词和行数。

你可以做的另一件事是弄清楚如何操作一行,使其成为宏,然后重复运行宏。例如,'C-x(C-s foo C-g bar C-x)'将搜索单词“foo”,然后键入单词“bar”,将其更改为“foobar”。然后,您可以执行一次“C-u C-x e”,它会持续运行宏,直到找不到更多的“foo”。

+1

此外,现代Emacsen键盘宏的绑定方便。 被绑定到'开始宏插入计数器',被绑定到'kmacro-end-or-call-macro' - 这节省了输入。忽略计数器功能(通常,完整文档为“C-hk RET”),这可让您点击“ C-s foo C-g bar ...” - 第一个结束宏定义,第二个执行它。 – ariels 2010-01-03 06:34:49

3

好吧,这里是我的elisp第一次尝试:

  1. 我开始对elisp的和paredit模式,开启双引号中的缓冲和粘贴文本
  2. 我把它绑定到使用let
  3. 一个符号
 
(let ((foobar "** Diego: b QI 
** bruno-gil: b QI 
** Koma: jo 
** um: rsrs pr0n 
** FelipeAugusto: esp 
** GustavoPupo: pinto, tr etc 
** GP: lit gtk 
** Alan: jo mil pc 
** Jost: b hq jo 1997 
** Herbert: b rsrs pr0n 
** Andre: maia mil pseudo 
** Rodrigo: c 
** caue: b rsrs 7arte pseudo 
** kenny: cri gif 
** daniel: gtk mu pr0n rsrs b 
** tony: an 1997 esp 
** Vitor: b jo mimimi 
** raphael: b rpg 7arte 
** Luca: b lit gnu pc prog mmu 7arte 1997 
** LZZ: an qt 
** William: b an jo pc 1997 
** Epic: gtk 
** Aldo: b pseudo pol mil fur 
** GustavoKyon: an gtk 
** CarlosIsaksen : an hq jo 7arte gtk 1997 
** Peter: pseudo pol mil est 1997 gtk lit lang 
** leandro: b jo cb 
** frederico: 7arte lit gtk 
** rol: b an pseudo mimimi 7arte 
** mathias: jo lit 
** henrique: 1997 h gtk qt 
** eumané: an qt 
** walrus: cri de 
** FilipePinheiro: lit pseudo 
** Igor: pseudo b 
** Erick: b jo rpg q 1997 gtk 
** Gabriel: pr0n rsrs qt 
** george: clo mimimi 
** anão: hq jo 1997 rsrs clô b 
** jeff: 7arte gtk 
** davidatenas: an 7arte 1997 esp qt 
** HHahaah: b 
** Eduardo: b 
")) 
    foobar) 

现在我把foobar变成了一些奇特的东西。

  • 首先我删除符号与正则表达式和拆分使用(split-string)
  • 然后我做一个mapcar把各行成单词列表中的字符串的文本
  •  
    (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)) 
    
  • 然后,我创建一个HashMap并将其绑定到temphash((temphash (make-hash-table :test 'equal))
  • 然后我循环到嵌套列表中添加元素的哈希表。我想我不应该做mapcar非功能编程,但没有人看)
  •  
    (mapcar #'(lambda (l) 
           (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash))) 
                      (if tempel tempel ""))) temphash)) (rest l))) 
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))) 
    
  • 最后,我提取从哈希表中的元素到另一组嵌套列表与Xah李的网页偷一个方便的功能,
  • 最后我几乎用的Mx PP-EVAL-最后SEXP
  • 这是一个有点令人费解的是打印到另一个缓冲区,特别是双地图车,但它的工作。以下是完整的“代码”:

     
    ;; Stolen from Xah Lee's page 
    
    
    (defun hash-to-list (hashtable) 
        "Return a list that represent the hashtable." 
        (let (mylist) 
        (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable) 
        mylist 
    ) 
    ) 
    
    ;; Code 
    
    (let ((foobar "** Diego: b QI 
    ** bruno-gil: b QI 
    ** Koma: jo 
    ** um: rsrs pr0n 
    ** FelipeAugusto: esp 
    ** GustavoPupo: pinto, tr etc 
    ** GP: lit gtk 
    ** Alan: jo mil pc 
    ** Jost: b hq jo 1997 
    ** Herbert: b rsrs pr0n 
    ** Andre: maia mil pseudo 
    ** Rodrigo: c 
    ** caue: b rsrs 7arte pseudo 
    ** kenny: cri gif 
    ** daniel: gtk mu pr0n rsrs b 
    ** tony: an 1997 esp 
    ** Vitor: b jo mimimi 
    ** raphael: b rpg 7arte 
    ** Luca: b lit gnu pc prog mmu 7arte 1997 
    ** LZZ: an qt 
    ** William: b an jo pc 1997 
    ** Epic: gtk 
    ** Aldo: b pseudo pol mil fur 
    ** GustavoKyon: an gtk 
    ** CarlosIsaksen : an hq jo 7arte gtk 1997 
    ** Peter: pseudo pol mil est 1997 gtk lit lang 
    ** leandro: b jo cb 
    ** frederico: 7arte lit gtk 
    ** rol: b an pseudo mimimi 7arte 
    ** mathias: jo lit 
    ** henrique: 1997 h gtk qt 
    ** eumané: an qt 
    ** walrus: cri de 
    ** FilipePinheiro: lit pseudo 
    ** Igor: pseudo b 
    ** Erick: b jo rpg q 1997 gtk 
    ** Gabriel: pr0n rsrs qt 
    ** george: clo mimimi 
    ** anão: hq jo 1997 rsrs clô b 
    ** jeff: 7arte gtk 
    ** davidatenas: an 7arte 1997 esp qt 
    ** HHahaah: b 
    ** Eduardo: b 
    ") 
         (temphash (make-hash-table :test 'equal))) 
        (mapcar #'(lambda (l) 
           (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash))) 
                      (if tempel tempel ""))) temphash)) (rest l))) 
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))) 
        (hash-to-list temphash)) 
    

    ,这里是输出:

     
    (("clô" "anão ") 
    ("clo" "george ") 
    ("q" "Erick ") 
    ("de" "walrus ") 
    ("h" "henrique ") 
    ("cb" "leandro ") 
    ("lang" "Peter ") 
    ("est" "Peter ") 
    ("fur" "Aldo ") 
    ("pol" "Peter Aldo ") 
    ("qt" "davidatenas Gabriel eumané henrique LZZ ") 
    ("mmu" "Luca ") 
    ("prog" "Luca ") 
    ("gnu" "Luca ") 
    ("rpg" "Erick raphael ") 
    ("mimimi" "george rol Vitor ") 
    ("an" "davidatenas eumané rol CarlosIsaksen GustavoKyon William LZZ tony ") 
    ("mu" "daniel ") 
    ("gif" "kenny ") 
    ("cri" "walrus kenny ") 
    ("7arte" "davidatenas jeff rol frederico CarlosIsaksen Luca raphael caue ") 
    ("c" "Rodrigo ") 
    ("pseudo" "Igor FilipePinheiro rol Peter Aldo caue Andre ") 
    ("maia" "Andre ") 
    ("1997" "davidatenas anão Erick henrique Peter CarlosIsaksen William Luca tony Jost ") 
    ("hq" "anão CarlosIsaksen Jost ") 
    ("pc" "William Luca Alan ") 
    ("mil" "Peter Aldo Andre Alan ") 
    ("gtk" "jeff Erick henrique frederico Peter CarlosIsaksen GustavoKyon Epic daniel GP ") 
    ("lit" "FilipePinheiro mathias frederico Peter Luca GP ") 
    ("etc" "GustavoPupo ") 
    ("tr" "GustavoPupo ") 
    ("pinto," "GustavoPupo ") 
    ("esp" "davidatenas tony FelipeAugusto ") 
    ("pr0n" "Gabriel daniel Herbert um ") 
    ("rsrs" "anão Gabriel daniel caue Herbert um ") 
    ("jo" "anão Erick mathias leandro CarlosIsaksen William Vitor Jost Alan Koma ") 
    ("QI" "bruno-gil Diego ") 
    ("b" "Eduardo HHahaah anão Erick Igor rol leandro Aldo William Luca raphael Vitor daniel caue Herbert Jost bruno-gil Diego ")) 
    
    7

    我会做的第一件事就是利用org-mode的标签支持。取而代之的

    ** Diego: b QI 
    

    你将不得不

    ** Diego       :b:QI: 
    

    其中org-mode识别为标记 “B” 和 “气”。

    要改变当前的格式标准org-mode格式,你可以使用 以下(假设你的源缓冲区被称为“自卫队”)

    (with-current-buffer "asdf" 
        (beginning-of-buffer) 
        (replace-string " " ":") 
        (beginning-of-buffer) 
        (replace-string "**:" "** ") 
        (beginning-of-buffer) 
        (replace-string "::" " :") 
        (beginning-of-buffer) 
        (replace-string "\n" ":\n") 
        (org-set-tags-command t t)) 
    

    这不是漂亮或有效率,但它得到工作完成。

    之后,你就可以使用以下方法来产生具有 你从shell脚本想要的格式的缓冲:

    (let ((results (get-buffer-create "results")) 
         tags) 
        (with-current-buffer "asdf" 
        (beginning-of-buffer) 
        (while (org-on-heading-p) 
         (mapc '(lambda (item) (when item (add-to-list 'tags item))) (org-get-local-tags)) 
         (outline-next-visible-heading 1))) 
        (setq tags (sort tags 'string<)) 
        (with-current-buffer results 
        (erase-buffer) 
        (mapc '(lambda (item) 
          (insert (format "%s: %s\n" 
              item 
              (with-current-buffer "asdf" 
               (org-map-entries '(substring-no-properties (org-get-heading t)) item))))) 
          tags) 
        (beginning-of-buffer) 
        (replace-regexp "[()]" ""))) 
    

    这个结果放置在一个名为“结果”缓冲区,如果创建它它不存在 已经存在。基本上,它将收集缓冲区“asdf”中的所有标签,对其进行排序,然后遍历每个标签,并搜索每个标题为 的标签,并将其插入“结果”中。

    通过一些清理,可以将其制作成功能;基本上只是 用参数替换“asdf”和“结果”。如果你需要这样做,我可以做 那。

    1

    以前的替代方法很有趣,但我不相信捕捉到“我将如何在Emacs中做这个问题的最近转换”方面。我怀疑有人Emacs的学习,着眼于使用的Emacs Lisp做整个工作可能在开始时是这样的:

    (defun create-tags-to-name (buffer-name) 
        "Create a buffer filled with lines containg `** TAG: 
    LIST-OF-NAMES' by transposing lines in the region matching the 
    format `** NAME: LIST-OF-TAGS' where the list items are white 
    space separated." 
        (interactive) 
        (let ((buf (get-buffer-create buffer-name)) 
        (tag-to-name-list (list)) 
        name tags element) 
        ;; Clear the destination buffer 
        (with-current-buffer buf 
         (erase-buffer)) 
        ;; Build the list of tag to name associations. 
        (while (re-search-forward "^** \\([-a-zA-Z0-9 ]+\\):\\(.+\\)$" (point-max) t) 
         (setq name (buffer-substring (match-beginning 1) (match-end 1)) 
         tags (split-string (buffer-substring (match-beginning 2) (match-end 2)))) 
         ;; For each tag add the name to the tag's name list 
         (while tags 
        (let ((tag (car tags))) 
         (setq element (assoc tag tag-to-name-list) 
         tags (cdr tags)) 
         (if element 
          (setcdr element (append (list name) (cdr element))) 
         (setq tag-to-name-list (append (list (cons tag (list name))) tag-to-name-list)))))) 
        ;; Dump the associations to the target buffer 
        (with-current-buffer buf 
         (while tag-to-name-list 
        (setq element (car tag-to-name-list) 
          tag-to-name-list (cdr tag-to-name-list)) 
        (insert (concat "** " (car element) ":")) 
        (let ((tag-list (cdr element))) 
         (while tag-list 
         (insert " " (car tag-list)) 
         (setq tag-list (cdr tag-list)))) 
        (insert "\n"))))) 
    
    2

    如果你知道*nix pipes,比你熟悉functional programming,因为函数式编程治疗方案,作为使用的功能的应用数据的连续变换。记住学校数学的功能组成?基本上,克∘˚F意味着你第一应用˚F,然后立即应用(克∘F)(X)= G(F(X))。功能程序是一个巨大的功能组合。和一个pipe is just a function composition,只是方向相反:(g∘f)(x)在数学上与命令行中的x | f | g相同。

    有一个第三方库dash.el,它为列表和树转换提供了多种功能,还提供了易于实现功能方法的函数和宏。其中之一是线程宏观->>,模仿命令行管道,其中:

    (->> '(1 2 3) (-map '1+) (-reduce '+)) ; returns 9 
    ;; equivalent to (-reduce '+ (-map '1+ '(1 2 3))) 
    

    因此,如果我们想以串联的方式施加动作处理文本数据,我们的功能可能是这样的:

    (defun key-value-swap (s) 
        (->> s 
         nil ; Split into lines 
         nil ; Remove stars from each line 
         nil ; Split each line 
         nil ; Add 1st element as a value to each element starting from 
          ; 2nd as keys 
         nil ; Return a hash-table 
         )) 
    

    的函数,它正是你想要再是这样的:

    (defun key-value-swap (s) 
        (let ((h (make-hash-table :test 'equal))) 
        (->> s 
         s-lines ; split into lines 
         (--map (s-split "\\(\\s-\\|:\\)" ; split each line 
             (s-chop-prefix "** " it) ; throw away stars 
             t)) 
         (--map (-each (cdr it) ; for every field in the line, except 1st 
            (lambda (k) ; append 1st line to value under key 
            (puthash k (cons (car it) (gethash k h)) h))))) 
        h)) ; return hash-table 
    

    (puthash k (cons (car it) (gethash k h)) h)看上去很神秘,但它只是意味着每个科下y在散列表中有一个列表,每当你找到一个新值时,你就会追加这个列表。因此,如果在b下有(Diego),我们发现bruno-gil也应该在b之下,b下的值变为(bruno-gil Diego)