grep常用技巧

grep匹配TAB

1
2
直接grep tab字符 //命令行下用”ESC TAB”输入
grep $'\t'

grep匹配减号

1
cat file | grep -- -1

去除所有空行

1
cat file | grep -v "^$" > file2

只显示以a开头的行。

1
cat file | grep '^a'

显示log中error附近的内容

1
cat file | grep -C5 "error"


awk常用技巧

隔行显示

1
cat file | awk '{getline; print $1;}'

取奇数/偶数行数据

1
2
awk 'NR%2==1' file //显示奇数行
awk 'NR%2==0' file //显示偶数行


vim常用技巧

vim下将x替换成制表符

1
2
%s/x/^I
p.s: 直接按TAB就可以啦 ,不需要用转义序列\t的

vim下将x替换成换行

1
%s/x/\r


sed常用技巧

查看文件选定的行

1
2
3
wc -l a.txt //统计a.txt 行数
sed -n '190,196p' a.txt //查看第190行到第196行
sed -n '190,1p' a.txt //查看第190行

将文件中的 , 换成 tab 符号

1
cat data.csv | sed $'s/,/\t/g'

现象:

服务中出现大量乱码数据,并且全部入库。

原因:

虽然maven项目的pom.xml文件中已配置成UTF-8,并且打出的jar包编码也是UTF-8,但运行时出现中文全是乱码。
发现是系统环境变量中未设置LANG相关变量为UTF-8导致的。

分析:

这是表面原因,究其根本,为什么JVM运行jar包的编码方式会依赖系统环境变量呢?查到原因是因为运行jar包时未指定jvm的 file.encoding参数,改为

1
java -Dfile.encoding=UTF-8 XXX 后彻底解决。

但是这样发布服务时就太依赖发布脚本了,那能否在程序中就设置好编码呢?比如这样:

1
System.setProperty("file.encoding", "UTF-8");

其实,这样设置是不生效的,因为JVM在启动时就开始cache编码方式了,程序中再设置已然无效。不过可以在程序每个读写数据的地方都设置编码方式,但是这样未免太工程浩大,而且无法保证每个写代码的人都能做到。所以最后改为依赖在发布脚本中设置编码方式,并且在服务启动时对编码进行检查。

refer:

As Edward Grech points out, in a special case like this, the environment variable JAVA_TOOL_OPTIONS can be used to specify this property, but it’s normally done like this: java -Dfile.encoding=UTF-8 XXX

Charset.defaultCharset() will reflect changes to the file.encoding property, but most of the code in the core Java libraries that need to determine the default character encoding do not use this mechanism.

Important points to note:

  • JVM caches value of default character encoding once JVM starts and so is the case for default constructors of InputStreamReader and other core Java classes. So calling System.setProperty(“file.encoding” , “UTF-16”) may not have desire effect.
  • Always work with your own character encoding if you can, that is more accurate and precise way of converting bytes to Strings.

Iterator详解

Iterator是java中的一个接口,借用源码中的注释:
An iterator over a collection. {@code Iterator} takes the place of {@link Enumeration} in the Java Collections Framework. Iterators differ from enumerations in two ways:

  • Iterators allow the caller to remove elements from the underlying collection during the iteration with well-defined semantics.
  • Method names have been improved.
1
2
3
public interface Iterator<E> {
... ...
}

需要迭代器的地方实现这个接口。集合的基本类Collection就实现了这个接口

1
2
3
public interface Collection<E> extends Iterable<E> {
... ...
}

举例来看下java的ArrayList类迭代器的实现。
ArrayList中iterator()方法:

1
2
3
public Iterator<E> iterator() {
return new Itr();
}

再来看Itr的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator<E> {
int cursor; // index of next element to return
int lastRet = -1; // index of last element returned; -1 if no such
int expectedModCount = modCount;
public boolean hasNext() {
return cursor != size;
}
@SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
public void remove() {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
ArrayList.this.remove(lastRet); // 见下文代码
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
@Override
@SuppressWarnings("unchecked")
public void forEachRemaining(Consumer<? super E> consumer) {
Objects.requireNonNull(consumer);
final int size = ArrayList.this.size;
int i = cursor;
if (i >= size) {
return;
}
final Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length) {
throw new ConcurrentModificationException();
}
while (i != size && modCount == expectedModCount) {
consumer.accept((E) elementData[i++]);
}
// update once at end of iteration to reduce heap write traffic
cursor = i;
lastRet = i - 1;
checkForComodification();
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}

调ArrayList.this.remove(lastRet)时,remove的实现

1
2
3
4
5
6
7
8
9
10
11
12
public E remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index, numMoved);
elementData[--size] = null; // clear to let GC do its work
return oldValue;
}

所以使用iterator来遍历List集合时,不能对list增删元素。

HashMap源码分析

1. 概要

  • HashMap是基于Map接口实现的,提供了所有Map支持的操作,并且允许key和value为null。HashMap可以近似地认为是HashTable,其差别仅在于前者允许null的key,value,并且操作不是同步的(unsynchronized)。
  • Iteration遍历整个hashMap的时间与hashMap的capacity(buckets的数量)加上hashMap的size的值是成正比的,所以如果想要高效地遍历HashMap,就不要将capacity的初始值设置的太高,也不要将load factor设置的太低。
  • 衡量HashMap性能的指标只有两个,一个是capacity的初始值,一个是load factor。
  • capacity是指hash表中buckets的数量,而capacity的初始值是指当hash表被创建时capacity被设定的值。
  • load factor是用来衡量当hash表扩容之前有多满的指标。load_factor = put_size/size
  • 当hash表的元素超过了阈值(loadFactor*capacity)时会自动将内部数据重建一遍,并将buckets的数量翻倍,这个过程称为rehash。
  • 根据经验来讲,当load factor为0.75时,较好地权衡了时间和空间上的取舍。load factor高虽然能减少空间的消耗但是增加了查询的代价,主要反映在put和get操作。
  • 当设定capacity初始值时需要考虑map中期望地元素个数和load factor,这样能最小化rehash的次数。如果capacity初始值大于最大元素个数除以load factor的值,则永远不会发生load factor操作。
  • 如果有很多mapping都要存放到同一个HashMap,那么在最开始就设置一个充足的capacity比当hash表超过阈值后再rehash要高效地多。
  • 请注意:所以的实现都不是synchronized的,如果有多个线程同时操作HashMap,并且有线程会修改HashMap的结构时,则必须要对此操作加synchronized标识。增加或删除HashMap中的元素都算是修改HashMap的结构,如果仅仅只是修改某个key的value则不算。
  • 如果没有可以对HashMap做synchronized的对象,那么可以使用
1
Map m = Collections.synchronizedMap(new HashMap(…)

来生成一个同步操作的Map.

2. HashMap的实现

2.1 Fields

1
2
3
4
5
6
transient Node<K,V>[] table;;//存储元素的实体数组
transient Set<Map.Entry<K,V>> entrySet; //Holds cached entrySet()
transient int size;//存放元素的个数
int threshold; //当实际大小超过临界值时,会进行扩容threshold = 加载因子*容量, DEFAULT_INITIAL_CAPACITY
final float loadFactor; //加载因子
transient int modCount;//This field is used to make iterators on Collection-views of the HashMap fail-fast

2.2 实现

2.2.1 java8的改进点

 java.util.HashMap 是JDK里散列的一个实现,JDK6里采用位桶+链表的形式实现,Java8里采用的是位桶+链表/红黑树的方式,this will improve the worst case performance from O(n) to O(log n).。

2.2.2 访问map的过程

仅以put操作为例来说明,put操作的过程:
当未冲突时;put的位置为 (tab.length-1)&key.hashCode()
当冲突时;如果冲突的位置上放的是TreeNode,则加入。否则加入冲突位置的元素链表的最末尾,如果加入后链表长度达到TREEIFY_THRESHOLD,则将链表转为红黑树。
加入操作完成后,如果size大于threshold则resize。

2.2.3 hashCode和size

hash函数如下:

1
2
3
4
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

tab.length的大小为2的幂次方,实现:

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

3. 遍历HashMap

当需要取出key-value时,推荐

1
2
3
4
5
6
Iterator iter = map.entrySet().iterator();
while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
Object key = entry.getKey();
Object val = entry.getValue();
}

4. 遍历时操作

4.1 遍历时remove

Removes the current element. Throws IllegalStateException if an attempt is made to call remove() that is not preceded by a call to next( ).

4.2 keySet的使用

keySet没有实现add(E e)方法,所有当对keySet调用add方法时会抛出UnsupportedOperationException。
KeySet继承关系如下:

1
2
3
4
5
6
class KeySet extends AbstractSet<K>
abstract class AbstractSet<E> extends AbstractCollection<E> implements Set<E>
abstract class AbstractCollection<E> implements Collection<E>{
public boolean add(E e) {
throw new UnsupportedOperationException();
}

4.3 直接使用subclass遍历时不允许修改map

1
2
3
4
for (Instance key : InsMap.keySet()) {
keySet.remove(instance2);
System.out.println(InsMap.get(key));
}

remove后第二遍进入for循环时会抛ConcurrentModificationException。
原因是KeySet, Values, EntrySet这三个subclass都不允许在遍历过程中map被修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
Node<K,V>[] tab;
if (action == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next)
action.accept(e);
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}

A good engineer has technical mastery. A great engineer has these additional qualities:

  • Clear communication of complex ideas.
    Can the engineer explain themselves to non-technical stakeholders, as well as other engineers? Many technically proficient engineers are not considered great because they can’t communicate their ideas.
  • They love to code.
    Being an engineer is a great high paying job, which is why many good developers do it. Great engineers would code even if that wasn’t the case. They keep their skills current, and they have the stamina to power through long hours because they are doing what they love.
    Desire to simplify instead of making things more complex. Hard, complex challenges are often fun for developers. Great engineers want to simplify the problem instead of building something complicated.
  • A strong business and product sense.
    In the development of a feature, developers often need to make product decisions that aren’t covered in the spec. Their ability to make the right call depends on an understanding of why a feature is good for the business, and how products should be built.
  • They focus on the highest impact items.
    Good engineers get distracted. Great engineers spend their time where it matters.
  • Strong social skills.
    An engineer needs to effectively interact with people across the company to be great a their job. If they can only interact with other developers, then they are only good at their job.

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in trobuleshooting or you can ask me on GitHub.

single asterisks

double asterisks

  • Red
  • Green
  • Blue
  1. Bird
  2. McHale
  3. Parish

Quick Start

Create a new post

1
$ hexo new "My New Post"

More info: Writing

Run server

1
$ hexo server

More info: Server

Generate static files

1
$ hexo generate

More info: Generating

Deploy to remote sites

1
$ hexo deploy

More info: Deployment